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As the introductory art course in the University of Minnesota’s 
eneral College has been developed during a six-year period, many 
yuestions have arisen. How should student progress be measured? 
Are the conventional essay or objective tests adequate? Of what value 
s the laboratory method of teaching art? Should all students try to 
reate art objects? What are the purposes and values of lectures? 

Our testing program in art, which is part of the General College 
esting program developed by Dr. Alvin C. Eurich and his associates, 
has been devised to give us a better understanding of these and many 
ther issues.24_ So far data from our tests have helped to clarify some 
ssues and they have also raised many new ones. We believe that the 
ests in themselves have no value or interest unless they contribute to 
bur understanding of art activities and art teaching. In no sense is the 
program finished, but in it we have taken the first steps toward a 


harper understanding and more clearly objective evaluation of our 
eaching problems. 


THE UNDERLYING PHILOSOPHY 


Any evaluation program is necessarily based on some philosophy, 
nd the clarity and definiteness with which that philosophy is stated is 
ighly important to the success of the project. In the General Col- 
pge we have interpreted art activities as phases of normal human 
ehavior, not basically different from other activities.®-!°-%:21_ Thus, 
rt creation seems to be a type of problem-solving similar to creative 
ctivity in physics, law, or medicine. To be sure, the problems and 
ihe materials employed are not the same, but the creative process in 
rt exhibits few, if any, unique characteristics. Similarly, art appre- 
ation is apparently a type of perception involving a recognition of 
herit in art and a feeling of enjoyment. Working from this basis we 


an describe in varying degrees of specificity what a person does when 
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he engages in an art activity. After defining art activities in behavioral 
terms, the test-maker finds it necessary to devise situations (or test 
items) which will show the presence or absence of that particular 
behavior, and combine these into tests in the customary manner.’:!8.21 

There are some artists and psychologists who believe that it is 
impossible to measure art activities objectively. To them art meas- 
urement seems a contradiction in terms, for is not art concerned with 
the infinite, the intangible? Why try to measure the immeasurable? 
Then, too, some experts in the fields of art and psychology declare that 
the measurement of art activities is undesirable because the very act of 
measurement destroys the most important quality of the art experience 
which, to them, is a delicate feeling easily destroyed. On the other 
side, are those experimentalists who argue that if anything exists, it can 
be measured. No one would deny that art products and art activities 
exist. As yet these skeptical experimentalists may be puzzled by the 
complexity of the problem, but they are hopeful that continued work 
will bring results. 

With a few notable exceptions the debate has been pursued as a 
matter of theory. The classroom teachers, the artists, the psycholo- 
gists—all have answers to the question: ‘‘Can human behavioral 
patterns in the field of art be identified, changed, and the changes 
measured?” but few have data with which to support their opinions. 
It would seem the sensible procedure to attack the problem of art 
measurement experimentally, to devise as many tests as seem worth 
while, and to keep an open-mind regarding the results until sufficient 
data have been collected. There is little value in attempting to pre- 
dict in advance the limits beyond which art measurement cannot go. 
Experimentation will provide the answer. 


TESTS AND TECHNIQUES 


In the General College’s testing program the outcomes of instruc- 
tion have been classified into four divisions: Vocabulary, Knowledge of 
Facts and Principles, Application of Facts and Principles, and Attitudes. 
The tests and techniques which have been invented or developed, as 
well as the standardized tests which have been found useful, are 
described below under these four headings. Some of them have been 
well developed; more are still in experimental stages. They are 
reported primarily as suggestions to others who have similar problems. 

(1) Vocabulary.—Knowledge of vocabulary is perhaps easier to 
measure in the conventional manner than any other outcome of art 
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teaching, inasmuch as the widely used, new-type testing techniques are 
useful. To be sure, the task of measuring vocabulary adequately has 
hardly been started, but it is simple to define one term by selecting its 
synonym from a list of words. Three typical techniques may be men- 
tioned. Multiple-choice items show the same advantages in measuring 
knowledge of art vocabulary that they show in measuring other types of 
information. Matching items permit the sampling of a large number 
of words quickly and are highly reliable. Completion items, especially 
of the controlled type, are efficient in that guessing is almost entirely 
eliminated, but they are, unfortunately, time-consuming. 

The great disadvantage inherent in all of these techniques is that 
one word is defined by other words, and thus the test items are entirely 
verbal. However, art products are essentially plastic, not verbal. It 
is less important to know the dictionary definition of an art term than 
it is to be able to apply the term correctly to a work of art. For exam- 
ple, many art students can quote in parrot fashion definitions of 
Romanesque and Rennaissance architecture, and yet when they see 
buildings of the two styles, they may fail to distinguish one from the 
other. It seems to us more important to use art terms properly, to 
recognize the quality or qualities which the term describes, than merely 
to recognize a definition or a synonym. 

To overcome this fault in the conventional measures of art vocabu- 
lary we have experimented with an Art Vocabulary Test. The test 
stimulus used is an original art object or a photograph. The subject is 
asked to select from a list of art terms those words which describe 
qualities or characteristics of this particular work of art. The list of 
terms is classified under several major headings. Under Type of Art 
such terms as ‘“‘naturalistic,”’ “realistic,” ‘“‘abstract’’ and ‘‘formal’’ 
appear; under Total Effect such terms as ‘dramatic,’ “melodra- 
matic,” “sentimental,” “depressing,” ‘‘joyous’’; under Organization, 
“triangular,” ‘‘ vertical,” ‘‘static,” “‘complex,” etc. A tentative key 
has been prepared for a Minneapolis skyscraper, a Gothic cathedral, a 
Cubist painting, an Oriental statue, an Impressionist painting, and 
an advertisement. Additional keys could be prepared for any work of 
art deemed suitable. Inasmuch as the Art Vocabulary Test measures 
a type of activity useful in every-day life, it promises to be a better 
measure of vocabulary than the purely verbal tests. 

(2) Knowledge of Facts and Principles.—Like knowledge of vocab- 
ulary, knowledge of facts and principles is comparatively easy to meas- 
ure. The multiple-choice technique is the one most readily adaptable 
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to a large number of situations, but completion, true-false, and match- 
ing items are all useful. In an attempt to emphasize related knowledge 
we have also used the double matching item. For example, in dealing 
with personalities in the field of art, the first column in the test item 
may contain the names of artists, the second column the classification 
to which the artist belongs, and the third a work of art by the artist. 
This is illustrated by the following example: 














A, artist B, school C, example of work 
SP by a doses cctcguada American Scene “Card Players” 
Packs teetibseveccaar Cubist “Haystacks” 
fre Impressionist “Still Life with Blue Vase”’ 
Op Dins asecaeseeseukace Post-Impressionist | ‘Woman with Mandolin”’ 
Geiacecttcnes.caceneseanneeks etc. etc. 





The subject places the proper number in each blank. This type of item 
may also be readily applied to the measurement of design principles, 
color, materials, etc. 

For those special situations in which it seems worth while to have 
the students thoroughly master an assigned reading, we have devised 
an “outline test.”” The content in the assigned reading is carefully 
outlined, and then some of the statements are deliberately falsified. 
The student is asked whether each statement is true or false, and, if 
false, which word makes it so. Such a test does not permit creative 
thinking on the student’s part, but it enables the instructor to determine 
how much of the material has been understood and remembered. 

Although it is customary to use a verbal stimulus in the objective 
testing of knowledge of facts and principles, there are other possible 
stimuli in the arts, and the use of an actual art object or a photograph is 
generally desirable. More than most fields of education, art has suf- 
fered from over-verbalization or, stated more correctly, from the dis- 
sociation of the verbal and plastic. Some art courses, as, for example, 
the typical lecture courses on the history of Fine Arts, are entirely 
verbal, and the test questions in these courses are usually asked and 
answered entirely in words. In other art courses, such as those in 
creative drawing or painting, verbal explanations of plastic problems 
are used too little. Although such specialized instruction as the over- 
verbalized art history may be suited to the scholar and the under- 
verbalized participative work to the practising artist, neither directly 
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meets the layman’s needs. The layman wants to relate his knowledge 
and his conversation to real works of art in his own environment. 
In all probability he will neither engage in long, theoretical discourse 
about art nor will he work directly with art mediums. More likely he 
will discuss a painting while looking at it, comment on a building as 
he passes it, talk about the furniture he is buying. He talks while he 
sees. 

To make art tests more vital by making the tests deal directly with 
the materials of the course and to arrange conditions similar to those in 
real life, a photograph of a work of art—a building, a painting, a textile 
—may be projected on the screen as the test stimulus. The subject 
answers questions of any type desirable—multiple choice, completion, 
etc.—which refer directly to this work of art. For example, a photo- 
graph of Chartres Cathedral may be shown. The items can relate to 
the style of the building, the approximate date, the country to which it 
belongs, the method of construction, the general design, the total effect, 
the use of decorative elements, etc. The responses may be stated in 
conversational terms much as they would occur in daily life. An 
example of this type is: 


1. Which statement would you make about the building shown? 


—____a. “It was copied from earlier work in England.” 
____b. “‘It was copied from Romanesque work.” 

____c. “It is an example of eclectic architecture.” 

___d. “It is a good example of French Gothic architecture.” 
e. “It is a Classical building.” 





Admittedly the use of a visual rather than a verbal stimulus makes 
the administration of a test more complicated, but if instruction in the 
course emphasizes contact with art objects, the evaluation of the stu- 
dent’s development should do the same. 

(3) Application of Facts and Principles.—In the General College art 
courses great emphasis is placed on direct application of the knowledge 
acquired in the course to the every-day art problems of the students. 
The General Arts course was organized primarily because the students 
buy and arrange art objects such as clothes and furniture; because their 
lives will be richer if they appreciate objects of art such as, for exam- 
ple, paintings, sculpture and advertisements; and because they will 
find it worth while to discuss art problems. These are three basic 
needs around which the course has been built. We are less concerned 
with the student’s acquisition of knowledge for its own sake than with 
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knowledge which is useful. Unused and unusable knowledge is an 
extravagance which few student’s can afford today. 

Many and varied techniques are already available for measuring the 
student’s ability to apply what he knows about art. Some of these 
techniques are identical with or similar to the devices used in educa- 
tional tests constructed for other fields of instruction. However, there 
are some outcomes of art instruction for which these standard tech- 
niques are not suited. Both old and new techniques are discussed 
below. 


(a) Conventional Objective Testing Techniques 


Of all typical techniques the multiple-choice item probably offers 
the greatest possibilities for measuring the degree to which the student 
is able to apply the information he has gained. For example, in dis- 
cussing the Graphic Processes it may be desirable to have the student 
know that lithography is a process of printing from stone slabs or 
metal plates, that it produces rich black and ‘mealy’ gray tones, etc. 
These purely factual items would be measured under the category of 
Knowledge of Facts and Principles. But if such knowledge is to be of 
greatest value to the student, he must apply it so that his understand- 
ing and appreciation of lithographic prints is increased. One method 
of measuring the application of such knowledge is to describe in some 
detail a natural scene suggestive of lithography which an artist might 
use as subject-matter for a print, and then determine whether or not 
the student would choose lithography as the most suitable medium. 
An example of this type of application item follows— 


1. An artist has been impressed with the view along the water front 
of a city as he walked home early in the evening. The buildings 
stand out as large, simple masses, the water shows a rich, almost 
greasy pattern. Everything looks black and grey. Which of the 
following media do you think would be most suitable for repre- 
senting this scene? (a. aquatinet, 6. drypoint, c. etching, d. lithog- 
raphy, e. pen and ink.) 





Similar items have been developed for water color, oil, fresco, 
tempera painting, etc. 

In testing the student’s ability to apply his knowledge of archi- 
tectural materials, we may ask him why steel is better than stone for 
beam construction. Or, to measure the application of knowledge of 
color and design, a multiple-choice item may present as a situation a 
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room which has some obvious fault (the ceiling may be too high or too 
low, the room may appear too large or small, etc.) and the multiples 
will be suggestions for remedying the appearance of the room. The list 
could be extended indefinitely, but the examples described above show 
a few of the possible applications of the multiple-choice technique. 

If the test situation is presented visually, several questions may 
refer to one picture. Then the items may vary so that they measure 
not only Application of Facts and Principles, but Vocabulary, Knowl- 
edge of Facts and Principles, and Attitudes as well. For examples, a 
photograph of the Paul Revere House, Boston, may be projected on a 
screen, and the following items used: 


( ) 1. The style of this house is (1. Colonial, 2. Post-Colonial, 3. Early 
American, 4. Greek Revival, 5. Eclectic). 

( ) 2. The overhanging second story is (1. Functional, 2. The result of the 
method of construction, 3. For artistic purposes, 4. A carry-over 
from an earlier necessity, 5. A Renaissance device). 

( ) 3. The diagonal window panes were developed in the architecture of 
(1. America, 2. The Renaissance, 3. The Baroque period, 4. The 
Mediaeval period, 5. The Roman period). 

( ) 4. It is the House of (1. Seven Gable, 2. Paul Revere, 3. Samuel 
McIntyre, 4. John Quincy Adams, 5. Betsy Ross). 

( ) 5. The room at the Minneapolis Art Institute belonging to this period 
has (1. A highly-polished floor, 2. Chintz draperies, 3. Crystal lights, 
4. One wall of wood and three of plaster, 5. Brick floor). 

( ) 6. The gardens built with such houses were derived from (1. Italy, 2. 
Holland, 3. Germany, 4. France, 5. England). 

( ) 7. The pendants at the corners are (1. Functional, 2. Structural, 3. 
Decorative, 4. Naturalistic, 5. Renaissance). 

( ) 8. The furniture in the house is (1. Simple and strong, 2. Very com- 
fortable, 3. Highly carved, 4. Well upholstered, 5. Of metal). 

( ) 9. A house of this type is (1. Well suited to contemporary needs, 
2. In harmony with our present social order, 3. Worth studying as 
an expression of one period in American life, 4. Not worth studying 
in the Twentieth Century). 

( ) 10. Which of the following most nearly describes your reactions to this 
house? (1. No person should build a house like this today. 2. I 
would like to live in this house. 3. This house is all right for some 
people but not for me. 4. I wish that everyone lived in houses 
similar to this one.) 


However, useful as these conventional techniques are, they are not 
sufficient to evaluate satisfactory student progress in a course which 
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aims to develop a broad appreciation of art. Our objectives have 
necessitated other types of evaluation procedures to give a clear picture 
of the degree to which we are achieving our aims. Some of these other 
testing techniques have been borrowed or adopted, some have been 
developed at the General College. The most useful are described 
below. 


(b) Order of Merit Tests 


General courses in art appreciation invariably aim to develop the 
student’s ability to recognize merit in existing art objects. Sometimes 
this objective is stated directly. At other times it is stated less 
directly, as—‘‘to improve taste,’ ‘‘to develop sensitivity to good 
design,” etc. Whatever the words used, the idea remains the same—to 
recognize merit of value in a work of art. This objective can be 
measured by several techniques. First is the method of paired compari- 
sons in which each item is compared with every other item to determine 
its value or merit. Although used in psychophysical studies, the 
method is tedious for both subject and examiner, and its usefulness to 
art teachers is, therefore, limited. Second is the rank order method in 
which all items in a given series are placed in order from best to poorest. 
In a test of sufficient length to be reliable this method often demands 
discriminations of unreasonable sharpness. For example, in one of the 
early studies!! dealing with appreciation of design, the subjects were 
asked to rank in order fifty Oriental rugs, a task which is next to 
impossible for even the most discriminating expert. Finally, there is 
the order of merit technique in which the objects to be evaluated are 
grouped in sets of two to four (or more), and the subject is asked to 
arrange these items in order of excellence. The advantages of this 
technique have been well described by several psychologists who have 
constructed art tests, the method having been used by Meier and 
Seashore,'® McAdory,'4:!5 and Christensen.*:> 

Two standardized tests of this type—the Meier-Seashore Art 
Judgment Test and the McAdory Art Test—have been used to some 
advantage in measuring the art judgment ability of General College 
students. They have value in determining the ability of our students 
in relation to the norms published by the authors of each test. How- 
ever, they are of little value in measuring student growth. Although 
the tests indicate that students make some progress while enrolled in 
art courses, they do not reveal any marked improvement of the aspect 
of art judgment measured by these tests. The relatively small gains 
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on these tests indicate that: (1) The ability measured by these tests is 
not subject to much improvement through training; or (2) they may 
indicate that the teaching is at fault; or (3) the measuring instrument 
itself is not sufficiently refined to measure such changes. The evidence 
on this issue is not decisive, but the fact remains that the Meier- 
Seashore Art Judgment Test and the McAdory Art Test are not of 
great value in measuring student progress in an introductory college art 
appreciation course, and that there is striking need for other tests of 
this general type. Similar results have been reported by Wentworth.” 

To meet this need four discrimination tests have been constructed 
by the writer. One test has been developed for each of the following 
fields of art—Industrial Art (Called the Owatonna Art Judgment 
Test), Painting, Sculpture, and Architecture. A fifth test in abstract 
design is now being constructed. Following the order of merit tech- 
nique the tests are composed of sets of photographs of three similar 
objects mounted on heavy cardboards approximately twelve by fifteen 
inches. For example, one set in the Painting Tests show three land- 
scapes, reproduced in color, while one set in the Owatonna Art Judg- 
ment Test shows three vases. Like the Meier-Seashore and McAdory 
tests, the new tests require the subject to rank in order the items in 
each set. It has been our aim in these tests to make the representa- 
tions of the art objects as true to the originals as inexpensive methods 
of printing permit. Since drawings and diagrams are abstractions 
of the original work of art and fail to show much of the peculiar quality 
which makes the art object good or poor, the new tests are composed of 
photographic representations. Furthermore, the subject is asked to 
evaluate the art objects from all aspects, not from the viewpoint of 
composition alone, as in some tests of art judgment. These tests 
extend the range of art judgment abilities which may be measured, and 
make possible such studies as the relationship between judgment in 
different fields of art, the relationship between knowledge and art 
judgment, and ‘)\- relative improvement in the ability to judge differ- 
ent types of art objects with different course content or methods. 


(c) Congruity Test 


The Congruity Test® developed for use in the General College pur- 
ports to measure the subject’s sensitivity to harmony or congruity in 
domestic architecture and house-furnishings. The test is composed of 
thirteen pages of photographs eight by ten inches bound together 
between stiff covers. The first page which shows exteriors of four 
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houses of different styles is mounted so that it remains visible all the 
while the test is being taken. Each of the pages that follow has photo- 
graphs of parts of houses; for example, one page shows six doorways, 
one shows six living-rooms, and other present gardens, bedrooms, 
tableware, etc. The subject chooses the item from each page—a 
doorway, living-room, garden, etc.—which is harmonious with each of 
the four house exteriors on the first page. Thus the test presents the 
subject with photographs of four house exteriors as the test situation, 
and he is required to select a doorway, garden, stairway, living-room, 
dining-room, table setting, dinnerware, chest of drawers, bedroom, 
painting, and textile for each house. Like the order of merit tests, the 
Congruity Test deals with an activity related to everyday art problems. 
Relatively short, it requires from twenty to thirty minutes to adminis- 
ter, but seems to be a reliable and valid measure of an important 
outcome of art teaching. 


(d) The Minnesota House Design and House Furnishing Test 


This test has recently been prepared by the Home Economics 
Department at the University of Minnesota and is published by the 
University of Minnesota Press.2 The test is composed of one plate 
showing photographs of the exteriors, living-rooms and bedrooms of 
two Colonial houses, and a test blank with one hundred twenty-nine 
items. The items refer to many phases of house design and furnishing, 
such as the design quality of different pieces of furniture and their 
harmony with other elements. One section of the test is devoted to 
suggestions for improving both the exteriors and the interiors of the 
houses. Although the test does not take into consideration the basic 
planning of the house, it is highly useful because it deals with choices 
and changes which the occupants of typical houses can readily make. 
The authors of the test present data showing that significant improve- 
ment occurs with proper training. Unfortunately, the test was too 
easy for the General College students enrolled in the second quarter 
of the General Arts course, but it might be well adapted to their 
abilities during the first quarter. 


(e) Work Tests 


All of the evaluation techniques described above require the student 
to make a choice from given items, and all have the unquestionable 
advantage of purely objective scoring methods. However, such tests 
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have not yet been developed to the stage where they measure all of the 
important outcomes of education, especially in the field of art. It is 
desirable to have the student not only recognize a correct or ‘‘best”’ 
response, but actually develop the reasons himself. For example, it is 
one thing to select the best from five statements concerning a painting, 
and quite another to originate an intelligent comment on the painting. 
While the correlation between these two abilities would probably be 
high, it is not safe to assume that such a fact is true until more data are 
obtained. Thus, it is worth while to experiment with techniques which 
may not at first satisfy all of the conditions of objective measurement, 
but which measure important outcomes of education in the arts. 

One such technique developed in the General College is a series of 
mimeographed Work Tests designed to give the student practice in 
making discriminating judgments and in stating his own reasons for his 
choices. Each Work Test is eight and one-half by eleven inches and 
shows three variations of the same object, the objects ranging from 
houses and landscapes to abstract designs. One of the three is of good 
design, one mediocre, and one poor. The subject selects the best and 
poorest, and, further, states his reasons for choosing each. Thus the 
Work Tests differ from the order of merit tests only in requiring the 
subject to state his reasons. 

Although there is some disadvantage in confusing testing and teach- 
ing devices, the Work Tests may be used as either, or as a combination 
of both, for they provide an opportunity for the student to make 
choices, to state reasons, and, more important, to have both choices and 
reasons criticized by the instructor. When such tests are scored and 
returned to a small discussion group, (or even a large lecture section), 
they offer an excellent basis for class discussion and individual student 
growth. 

The chief advantage of using the Work Tests in this way is that the 
issues concerning art judgment are brought into definite focus. 
Although nearly all art teachers aim to develop ‘‘ discriminating judg- 
ment,”’ few have stated clearly and definitely the basis on which such 
judgments are to be made. The students are often left wondering 
why this painting is called good, that one bad, and, consequently, they 
continue their faulty habits of thinking, fail to develop their dis- 
criminating power directly and efficiently. The activities suggested 
by the Work Tests—making judgments, stating reasons, grading and 
discussing judgments and reasons—can clarify art judgment for both 
students and teacher. 
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(f) Work Sheets 


As a means of measuring the ability to solve simple art problems, a 
series of mimeographed Work Sheets has been developed at the General 
College. Each Work Sheet presents (1) a problem to be solved, (2) the 
materials required, (3) a list of principles which the student is to 
observe and on which he will be graded, and (4) a drawing presenting 
the basic outline of the problem. For example, one of these Work 
Sheets deals with Furniture Arrangement. The statement of the 
problem is: “‘To arrange the furniture in a modern living-room.”’ 
Colored pencils, water color, or colored papers may be used in solving 
the problem. QGuilding principles are listed as follows: 





Excellent}; Medium} Poor 





Circulation should be easy and direct........... 
Furniture should be related to walls and openings 
Appropriate furniture should be selected......... 
Furniture should be in scale with the room...... 
Furniture should be grouped pleasingly.......... 
The ‘Principle of Design’ should be applied...... 
a. Balance the furniture and accessories......... 
b. Emphasize the important features............ 
c. Relate the objects rhythmically.............. 
Colors should be suitable...................... 
Rendering should present the arrangement clearly 
Arrangement should show some originality....... 
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Below this is an outline drawing of the plan and one wall of a living- 
room drawn at three-eighths scale. The student draws on this paper so 
that the statement of principles is always before him. This work is 
rated on each principle on a three-point scale as excellent, medium, or 
poor. Numerical values can be established for each scale step, and 
the student’s achievement stated in numerical terms. The principles 
can be weighted in any way desired in order to place greater emphasis 
on certain factors. 

The Work Sheets have the conspicuous advantage of permiting an 
evaluation of the student’s ability to do one type of creative problem- 
solving—an ability not measured by the more conventional objective 
testing techniques. Since all students are presented with the same 
basic problem, work with the same basic principles, and present their 
solution at the same size and on the same paper, it is relatively easy to 
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grade their achievement. At the same time there is considerable 
opportunity for them to express individuality. It would be possible, 
of course, to prepare a set of sample solutions, assign a single scale 
value to each, and grade each solution by comparing it with the scale 
of samples. However, this method has one disadvantage—the teacher 
tends to judge on a few criteria, and to neglect others which may be 
equally important but less conspicuous. It is difficult, for example, 
not to be influenced by technical dexterity, even though the excellence 
of the technique may mask serious flaws in the quality of the design. 
Therefore, it seems desirable to rate any type of creative work on 
several or many criteria. This grading method has the further advan- 
tage of providing data useful for diagnostic purposes. 

Like the Work Tests, the Work Sheets prove useful as teaching 
devices. Progress in the art classroom is frequently hindered because 
the students (sometimes the teachers, too) do not know what they are 
trying toachieve. Although all fields of instruction have suffered from 
this vagueness, art—being primarily non-verbal—has suffered more 
than most. It is of primary importance for both teacher and student 
to know what outcomes are expected from art problems. Stating the 
principles on the problem sheet aids in clarifying these outcomes. 


(g) Art Rating Scale 


The Art Rating Scale is a list of criteria for judging an art object. 
First the student gives his immediate reaction to the value of the whole 
art object, then evaluates special aspects of it. The form of the scale 
is indicated in the table on page 494. 

The advantages of this rating scale are: (1) To call attention to many 
aspects of the art object; (2) to provide a better method of expressing 
the worth of an art object than the typical descriptive phrases such as 
‘quite good,” “not so good,” “very good”; (3) to provide a method of 
comparing directly the values of different art objects; for example, two 
buildings; (4) to provide another measure of student growth in the 
ability to evaluate art objects. Arbitrary values can be assigned to 
each step on the scale, and thus to compute a total score expressing the 
merit of any work of art. To be sure, this total score would not place 
an ultimate and final evaluation of the art object, for it is hard to 
imagine a more ludicrous situation than for beginning art students 
trying to “‘grade’’ numerically the work of a recognized artist. But as 
an experiment the Art Rating Scale has advantages in approaching art 
judgment from a new angle, and in focusing attention on many factors 
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The Art Rating Scale has other advantages when used for teaching 
purposes. For example, our students in the General College art class 
were assigned the problem of rating two campus buildings, both used for 
classrooms but of conspicuously different architectural merit. Before 
using the rating scale the students discussed the two buildings without 
the benefit of instructor, decided that one was better than the other, 
and that they need not use the scale. However, they were encouraged 
to try the scale, and their discussions at the next class period were of 
quite a different type. Rather than talking in general terms of 
“‘better”’ or “‘worse,’”’ they compared and contrasted the buildings on 
many points. It seemed evident that our students had been led to 
observe in far greater detail, to discuss specific factors rather than to 
judge on one general impression. Briefly, the Art Rating Scale had 
encouraged careful observation and promoted discussion. 
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Specific reactions: 
A. Function: Does the art object fulfill its utili- 
tarian function efficiently?................... 
B. Expression: (What is the purpose, content, or 
meaning of the object?)...................5. 
1. Is its purpose clearly shown in its appear- 
0 EEE Se ENE Oe PE 
C. Organization: (How are the different parts of 
the object combined?)............cccccccccee 
1. Does it give you an impression of one idea? 
2. Is there enough variety to hold your atten- 
Ee OPT TEC EOL TTT OE 
D. Materials: (How have the materials been chosen 
a a a a sale i 0 el an i a 
1. Are they suited to the purpose of the object? 
2. Have the natural beauties of the materials 
been used to advantage, etc.?.............. 


















































The Art Rating Scale has also proved useful in grading the creative 
work of art students either by the instructor or by other members of the 
class. Although it is more time-consuming than the “first impression” 
system of grading, it has the distinct advantage of controlling personal 
prejudice and of directing attention to many, rather than few, features. 
If a student’s work has been carefully checked on this rating scale, and 
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both the work and the scale are returned to him, there is ample oppor- 
tunity for him to profit from his past work. With these data in hand 
both teacher and student can diagnose the individual strengths and 
weaknesses in the student’s work, and the student can realize clearly 
what he is expected to accomplish. If laboratory work were graded in 
this manner, it might help eliminate the confusion behind the remark 
often made about such instruction—‘‘ What’s it all about?” 


(h) Essay Tests 


While the essay test is not especially well suited to measuring the 
acquisition of knowledge, it is of great value in determining how well 
the student is able to recall and apply facts and principles. Better 
than most new-type test techniques in showing how well the student 
can organize what he knows, the essay test is difficult to grade, and 
hence usually an unreliable measure of student ability. However, the 
essay test, both abused and over-rated, can serve a useful purpose in 
evaluating student work in an introductory art course. For example, 
at the beginning of one quarter the class members were asked to write a 
critical commentary on a painting exhibited in the Art Laboratory. 
This assignment was repeated at the end of the quarter. The student’s 
papers were graded on two criteria—first, the number of relevant ideas 
expressed; and, second, the number of art terms used. Lists of both 
ideas and art terms compiled from the papers were made and used as 
the basis for grading. By establishing specific criteria in advance and 
grading each paper with reference to them, we avoided the usual dan- 
gers of scoring essay tests entirely subjectively. The results of the 
second group of tests showed that a significant improvement in the 
ability to criticize paintings had taken place during a relatively short 
period of time. 

(4) Attitudes and Interests.—The attitudes and interests which a 
student brings to a beginning art class should be of the utmost impor- 
tance in determining the course content and methods. To ignore 
attitudes and interests is to lose two of the greatest motivating forces. 
But, unfortunately, it is exceedingly difficult to measure them with any 
degree of accuracy. 

Techniques for measuring immediate memory of factual informa- 
tion have been relatively highly developed, and the methods of evalu- 
ating the ability to apply facts and principles have been greatly 
improved during the past decade. But techniques for the measure- 
ment of attitudes and interests are less well developed, partly because 
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the problem is more difficult, partly because interest in this phase of 
education has been neglected until recently. In fact, this field of art 
measurement has hardly been explored and the suggestions reported 
below only sketch the first steps in a long program. 


(a) Attitude Tests 


To the writer’s knowledge there are no published tests of art atti- 
tudes, nor are there as yet suggestive studies reported in the literature. 
Therefore, the first task was the preparation of some sort of attitude 
scale. The preliminary attitude scale was composed of over one 
hundred statements covering the major issues discussed in the art 
course. The students were asked to record their reaction to each 
statement in terms of agreement or disagreement on a five point scale. 
The list of statements was also submitted to a group of art experts, 
and, although there was considerable agreement as to the most desira- 
ble responses, some criticisms were made, most of them questioning any 
attempt to standardize attitudes toward art. Further, the statements 
covering a wide range of reactions to many art products were so varied 
that the total score had little meaning. Consequently, this scale was 
not developed further. 

This experience indicated that if the attitude scales were con- 
structed, they should measure some particular attitude, or a cluster of 
attitudes. Following this idea a scale to measure Progressive versus 
Conservative attitudes towards art has been developed in two forms. 
The statements in the first form are of the conventional type, as, for 
example: 

1. A tourist visiting Germany should spend his time looking at examples 
of modern rather than historic art. 


Strongly agree Agree Neutral Disagree Strongly disagree 


In the second form, a situation-response technique suggested by 
Alvin C. Eurich and developed by C. Robert Pace has been employed. 
The item equivalent to the one above is: 


1. Suppose that you are going to make a two months’ trip through Ger- 
many and, unfortunately, there is not time enough to see everything of 
importance. Germany is well known for its historic castles, old cities, and 
collections of historic paintings. There are also significant contemporary 
buildings, crafts, and paintings. How would you divide your time? 


a. Spend all of my time seeing the historic art products. 
___}b. Spend most of my time seeing the historic art products. 
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______e¢. Divide my time equally between the two. 
_____d, Spend most of my time seeing the contemporary art. 
_____e, Spend all of my time seeing the contemporary products. 


Data obtained from these tests will enable us to learn how progres- 
sive the students are when they enter the class and to understand better 
the effects of teaching on this particular attitude. As this work pro- 
gresses more tests of this type will be developed. 


(b) Observational Techniques 


In addition to formalized attitude tests, certain informal procedures 
have been of great value in determining the extent to which attitudes 
and interests have been developed or changed. Although it is not 
always possible to treat the data derived from these methods with 
great statistical precision, the results are useful in producing a more 
complete understanding of the student. 

At the University of Minnesota the General College Art Laboratory 
is open for approximately eight hours each day, although classes are in 
session for only two of these hours. This makes it a simple matter to 
keep a record of how much extra time each student spends in the 
laboratory, and these data are far better evidence of the interest which 
the student has in this aspect of the course work than any opinion or 
attitude expressed verbally. If the students find it worth while to do 
more work than is required, the instructor has a fairly reliable basis for 
assuming that the work interests the students. 

We are often asked which divisions of the course content interest the 
students most. Here, again, the work done in addition to the regular 
assignments is a good index of whether painting, architecture, sculp- 
ture, or craftwork give the greatest satisfaction to the students. It is 
also possible to tabulate the laboratory work on the basis of the 
mediums employed, and in that way to determine which mediums are 
most popular. Tabulating such data as these is equally useful 
for investigating individual differences or group tendencies. Although 
it cannot be used as a basis for the determination of the student’s 
grade, it is a valuable check on the results of instruction. 

Then, the art exhibitions brought to the University Gallery offer a 
striking opportunity to observe student reactions to original works of 
art. The popularity of each exhibit can be accurately estimated from 
the attendance records kept by the curator. The attendance of any 
group—for example, those enrolled in an introductory course—can be 
compared with that of a group not registered in the course. Whenever 
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class meetings are held in the Gallery, the behavior of the students is 
the best possible means of determining their likes and dislikes, and 
their conversation reveals to a large extent how much they have 
profited from their lectures, readings, and discussions. 

Another opportunity is offered by the University Gallery which has 
introduced a collection of original and reproduced works of art for the 
students to borrow much as they take books from the library. Records 
are now kept to determine which paintings are most popular, why they 
are taken, which students take them, and how long they are kept. 

Interests and attitudes may also be checked by the type of reading 
which students do in addition to their regular assignments. The 
General College has a reading room where a variety of books and 
periodicals is readily available. The student makes a record of the 
books read and the amount of time spent on each. The records, when 
analyzed, are a splendid indication of student reading interests. 

While these techniques are far more complicated than paper and 
pencil tests, their closeness to real-life situations is more than adequate 
compensation. Although education may place heavy emphasis on the 
acquisition of knowledge and the development of attitudes, it fails or 
succeeds primarily in the extent to which the knowledge and attitudes 
control behavior. What a man thinks is probably less important 
than what he says and does. Therefore, measures of doing, of overt 
behavioral patterns, may be the most direct way of evaluating the 
important outcomes of instruction. 


MEASUREMENT OF SPECIFIC OBJECTIVES 


The application of the evaluation techniques discussed above can 
probably best be explained by listing the specific objectives for the 
introductory art course, ‘“‘Art Today,” offered at the General College. 
Below each objective are listed two or three examples of this objective 
in terms of student behavior. Opposite the examples are suggested 
appropriate testing techniques. 


EXAMPLE OF OBJECTIVE 
TECHNIQUE 


A. VocaBULARY 


1. To Know the Meanings of Ordinary Art Terms. 
Knowing the meanings of such terms as 
cul-de-sac, super-block, radial plan. Multiple-choice Items, 
Knowing the meanings of such words as({ Matching Items. 
hue, value, intensity. 
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2. To Use Correct Terminology in Discussing Art. 


Using such terms as abstract, realistic, 


naturalistic, accurately. 
Designating which of a group of paintings 
is most dynamic, abstract, realistic, etc. 


B. KNOWLEDGE OF Facts AND PRINCIPLES 


1. To Become Familiar with the Fundamental 
Problems Underlying All Art Activities and 
Products: The Human Problem, the Formal 
Problem, the Problem of Materials and Pro- 
cesses. 


Knowing the probleins which arise in 
planning a house. 

Knowing the problems which arise in 
selecting and arranging pleasing forms. 
Knowing the problems which arise in 
making clay into objects of daily use. 


2. To Know the Fundamental Facts About Art in 
the Home, the Community, Commerce, Indus- 
try, and Religion. 

Knowing the problem which the Industrial 
Designer encounters in designing furniture. 
Recognizing the different types of city 
plans. 


3. To Know the Fundamental Facts about Design 
Principles and the Plastic Elements. 
Knowing the different types of balance. 


Knowing what is meant by ‘“‘deep space”’ 
in painting. 


4. To Know the Possibilities and Limitations of 
Materials. 


Vocabulary Test de- 
scribed above. 

Series of paintings to 
be ranked in order of 
dynamic quality. 


Multiple-choice Items. 
Items in which a series 
of factors are to be 
ranked in order of im- 
portance. 

Matching Items. 

Essay Tests. 


Same as above 


Matching Items in which 
sketches of the plans are 
paired with the correct 
term. 


Matching Items in which 
sketches of the types of 
balance are paired with 
correct term. 
Multiple-choice Items. 
Item in which the subject 
selects from a series the 
painting which has “deep 
space.” 
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Knowing that clay is well suited to round 
shapes. 

Knowing that wood can be used effec- 
tively as a veneer. 


5. To Know the Ordinary Techniques and Proc- 
esses in the Field of Art. 


Knowing the processes by which color 
printing is produced. 

Knowing the basic principles of post and 
lintel construction. 


C. APPLICATION OF Facts AND PRINCIPLES 


1. To Evaluate Art Products Wisely. 
Choosing the best and poorest from three 
neckties. 
Stating reasons for the above choices. 


2. To Select Congruous Objects from a Mis- 
cellaneous Assortment. 
Choosing a garden which is harmonious 
with the exterior of a house. 
Selecting drapery materials to be used in a 
particular room. 


3. To Make Functional and Pleasing Arrange- 
ments of Art Objects in Everyday Life. 
Hanging pictures so that they are pleas- 
ingly related to the background. 
Arranging books, lamps, and accessories 
on a table. 


D. ATTITUDES AND INTERESTS 


1. To Enjoy Art Products and Activities. 
Deriving satisfaction from exhibitions of 
contemporary paintings. 


Finding that arranging furniture for ap- 
pearance and comfort is fun. 
2. To View Contemporary and Historic Products 
with an Open Mind. 
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Multiple-choice Items, 
Matching Items. 


Multiple-choice Items. 
Matching Items. 


Order of Merit Items. 


Multiple-choice Items. 
Subjective Items. 


Congruity Test (de- 
scribed above). 

Items requiring the stu- 
dent to select a pattern 
or color (or both) for a 
room. 


Order of merit items. 


Recording attendance at | 


exhibitions. 

Observing reactions at 
exhibitions. 

Reports from students. 
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it modern houses without preju- ‘heen tied ing 
ae : conservatism-liberalism. 
Recognizing the value in Gothic churches. 
3. TotRecognize That There May Be a Variety of 
Solutions for Each Problem. 
Recognizing that both Greek temples and Attitude Items in which 
Gothic churches are sincere religious build- the two are compared. 
ings. 
Recognizing that dinner plates may be Subjective Items. 
made of clay, glass, metal, etc. 
4. To Recognize the Importance of Art in Every- 


day Life. 
Realizing that the combination of clothing Observing the clothes 
worn each day is important. worn in class. 
Being concerned with the furniture ar- Reports from the stu- 
rangement in one’s own room. dents. 
5. To Recognize the Different Appeals Which an 
Art Object May Have. 


Recognizing that a painting may have 
beauty of color as well as beauty of mean- 
ing. 

Recognizing that the appearance as well 
as the efficiency of an automobile is im- 
portant. 


Subjective Items, 


STATISTICAL SUMMARY 


Having clearly defined course objectives, and having developed 
techniques and items which seem to measure such objectives, the 
educator next wishes to know whether or not these tests are functioning 
properly. First, is the test valid, that is, does it measure what it is 
supposed to measure? Second, is the test reliable, that is, does it 
measure what it measures consistently? Third, is the test suited to the 
group on which it is being used, that is, is the test too difficult or too 
easy, is the range of scores sufficiently large, etc.? 


VALIDITY 


Course Examinations.—The art testing program in the General 
College each quarter makes use of two general tests designed to measure 
insofar as possible the outcomes of the work of that quarter. These 
examinations are composed of items measuring knowledge of 
vocabulary, knowledge of facts and principles, ability to apply this 


_—_< 
-_- ~ 2 
- 





502 The Journal of Educational Psychology 


knowledge, etc. The techniques used are the customary objective 
test techniques: true-false, matching, completion and multiple-choice 
items. The tests require from one to two hours to administer. 

The validity of these tests is established by employing items which 
are direct expressions of the objectives stated for the course. Thus, if 
one of the objectives is ‘‘to know the possibilities and limitations of 
materials,’ one or more items on the characteristics, possibilities, and 
limitations of clay, wood, and metal are developed; or, if one of the 
objectives is ‘‘to become familiar with the graphic processes,’’ items are 
developed to show whether or not the student is able to recognize 
different types of prints, etc. On all purely factual items, or those on 
which the opinion of experts is unanimous, we assume that a correct 
answer can be established, and, because careful attention has been paid 
to measuring the stated objectives of the course, we assume that the 
test so developed is a valid measuring instrument. 

Art Judgment Tests.—Inasmuch as the purpose of the art judgment 
or discrimination tests described above is somewhat different from the 
purpose of the course examinations, their validation received further 
consideration. These tests were developed not simply as tests to 
measure specifically the outcomes of one particular art course, but as 
tests of sensitivity to merit in art which might be used in a variety of 
situations. Intended for wider use, they are more generalized tests. 
A second difference separates them even more sharply from the course 
tests, and that is the behavioral patterns measured—judgment, 
sensitivity, discrimination, or ‘“‘taste.’”’ Values in art are highly 
disputed by both laymen and experts. The desirability of conformity 
of expert opinion is argued pro and con. As far as possible, the writer 
has not been concerned with the degree to which students and experts 
should be in agreement, but with developing instruments which would 
show definitely the extent to which they agree. Such instruments are 
of great value in investigating relationships among abilities, the range 
and conditions of individual differences as well as the results of teaching 
procedures. In other words, these art judgment tests are comparable 
to a surveyor’s tape or a butcher’s scales. Thus, if an art educator 
believes that after a period of instruction his students should agree 
more closely with expert opinion than they did before, he could employ 
these tests and should be pleased if the results were in the direction he 
wishes. But if, on the other hand, he believes that agreement with 
expert opinion is undesirable, he could use these tests to find whether 
or not he was achieving that aim. 
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The method used for establishing the validity of the Owatonna Art 
Judgment Test, the Painting Discrimination Test, the Sculpture Dis- 
crimination Test, the Architecture Discrimination Test, and the Con- 
gruity Test was substantially the same in each case. The criteria are 
as follows: 

(1) Test items which give an adequate representation of the art 
object—Preliminary work done on the Owatonna Art Judgment Test*® 
indicated that photographs were much superior to sketches or drawings 
as representations of art objects, and in consequence photographic 
reproductions were used. 

(2) Consensus of Art Experts.—The field of art should be no excep- 
tion in accepting the general belief that the expert is better qualified to 
pronounce judgment in his particular subject than is the non-expert. 
His native ability plus his training plus his experience should fit him 
to do the task. If they do not, he has little right to teach and guide 
others. From seven to thirty members of the art staffs at the Univer- 
sity of Minnesota judged all items in the discrimination tests, and only 
those items on which two-thirds or more agreed were included in the 
final tests.® 

(3) Group Differences.—A valid test of art judgment should differ- 
entiate between groups of subjects who are art majors and those who 
are not. This represents an extension of the above idea that experts 
are capable of superior judgments. Preliminary studies carried on 
with small groups indicate that these tests do show differences between 
groups of art and non-art majors in the college population.*® 

(4) Discriminating Power of Test Items.—The internal consistency 
of the test is assumed to be a further measure of validity. If an item 
is consistently missed by those making high scores on the test as a 
whole, it is logical to assume that the item is faulty. In many ways 
this is a development of the above criteria (2) and (3) since it is based 
on the differences between groups. It differs from criteria (2) and (3) 
in substituting the test which is being validated for an outside criterion. 
Thus, while the art experts and the art students gain their rating from 
their professional standing or the art courses which they have elected, 
this high-score group gains its standing from its performance on the 
test. All these items which showed insufficient discriminating power 
were discarded. 

Art Attitude Tests —The problem of validating the measures of 
conservative-liberal attitudes toward art was somewhat different from 
that of validating the judgment tests. Here the problem was merely 


ee ee ~ 








t 
4 

: 
in 
¥ 


504 The Journal of Educational Psychology 


to determine whether the alternatives indicated a progressive or con- 
servative attitude. Thus the experts were not asked for their own 
opinions or prejudices, but only to state what attitude the different 
responses implied. Only those items on which seventy-five per cent 
of the experts agreed were retained. Although it would be highly 
desirable to compare the scores on these tests with some measure of 


overt behavior toward contemporary art, this phase of validation is not 
finished. 


RELIABILITY 


The coefficients of reliability computed by the split-half technique 
and corrected for length with the Spearman-Brown prophecy formula 
indicate that the tests are satisfactorily consistent as measures of 
educational achievement. Ranging from .88 to .96, these coefficients 
compare favorably with those of tests from other fields of education. 


STATISTICAL SUMMARY OF ART TESTS 








8 ub- High- Stand- 
— est pos- Ra M ard Relia- 
ve co eB° | sible | *278*| Mean) devia. | bility 

el score tion 
1. Examination A.......... 48 79 |25— 70) 46.19) 10.68 |.90+ .02 
2. Examination B.......... 51 71 7— 51) 20.47) 9.08 |.89+.02 
3. Examination C.......... 65 124 |32-101) 62.50) 15.14 |.91+.01 
4. Examination D.......... 71 140 /|34-110 78.35 15.85 |.93+.01 
5. Examination E.......... 72 200 (50-145/103.3%, 18.84 |.89+ .02 
6. Owatonna Art Judgment 

MR atl ak ates baldness a 200 114 |23-100) 64.22) 14.64 |.96+ .004 
7. Painting Discrimination 

SEE ee peer 200 90 (18 66) 39.42) 10.72 |.87+.01 
8. Sculpture Discrimination 

ee ee oleae ce ae 200 90 |15- 71) 36.15) 12.70 |.89+.01 

9. Architecture Discrimina- 

RID Pe re 65 81 |18— 66) 44.70) 8.39 |.90+.02 
10. Congruity Test.......... 200 48 |10— 45) 30.61) 6.68 |.87+.01 
11. Art Attitude Test 1...... 100 179 |45-123) 92.55) 16.71 |.88+ .02 
12. Art Attitude Test 2...... 60 180 /|60—-124/101.00) 10.14 |.91+.02 























RANGE, MEAN, AND STANDARD DEVIATION 


These statistics give further evidence of the degree of efficiency with 
which each test is functioning. Ideally the scores would range from 
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one to one less than the highest possible score with a mean of one-half 
the highest score and a standard deviation of one-sixth the range. 
Practically, such conditions seldom if ever obtain, but some tests 
approximate these conditions much more closely than others. Thus 
Course Examination E (as reported above) approximates these condi- 
tions while Course Examination B, with a mean of only 20.47 out of a 
possible score of 71, is obviously too difficult for the group to which it 
was given. In a similar manner the Owatonna Art Judgment Test 
functions better than does the Sculpture Discrimination Test. 


SUMMARY 


The varied outcomes of a general art course can only be measured 
adequately by many evaluation techniques, each one suited to its 
particular purpose. Miultiple-choice, true-false, matching, and com- 
pletion items; order of merit tests; essay tests; and informal observa- 
tional techniques all have their place in a well-rounded, planned 
evaluation program. No single test or techniques is sufficient. Each 
makes a contribution, and it should be the aim in an experimental 
program to discover the possibilities and limitations of different 
techniques for specific purposes. Organized on this basis, a testing 
program can be as creative, can offer as wide opportunities for a lively 
imagination as painting a picture or carving a statue. The limits are 
set only by the inventiveness of the person in charge of the testing 
program. The ever-present need for just evaluation of student 
progress and for better understanding of teaching efficiency should 
inspire each art instructor to explore and develop all possibilities. 
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AN EXPERIMENTAL STUDY OF INDIVIDUAL 
DIFFERENCES IN SCHOLASTIC MOTIVES* 


STANFORD C. ERICKSENT 
University of Arkansas 


I. INTRODUCTION 


In the recognition and treatment of individual differences Progres- 
sive education has laid greatest emphasis on the individualization of 
instruction. This experiment is designed to show that individualiza- 
tion on the motivational side of learning is equally possible and 
important. 

This unbalance of emphasis is clearly seen on the college level 
by the great ado that is made over intelligence tests, aptitude tests, 
high-school grades and the like as indicators predicting college success. 
Yet it is a common observation that many students failing in their 
courses are certainly capable, according to these tests, of doing satis- 
factory academic work. The trouble may frequently be classed 
as a lack of motivation and interest. Nevertheless, in most of our 
colleges the administration usually offers only two specific forms of 
motivation: For a small per cent there is the lure of winning special 
honors and recognition, and for the rest of the students motivation 
consists chiefly in fear of probation or dismissal. 

Obviously it would be impossible to deal with all of the motives 
that students might have with respect to their many college problems 
and objectives. It was decided to select arbitrarily one more or less 
universal incentive which is well defined and toward which students 
have rather specific attitudes and feelings. The question chosen was: 
Why do students want to make good grades? The matter of making 
good grades is a major problem with most college students and one 
toward which they have rather definite feelings and opinions. Any 
differences that might occur in motivation toward this problem might 
be considered representative of individual differences that exist with 
respect to other college situations and objectives. 

The results of this experiment may have some practical value, 
but the questions that gave rise to the study were theoretical ones 
relating to the nature of individual differences on the motivational 
side of college work. Specifically the questions are of three types: 





* Research paper No. 707, Journal Series, University of Arkansas. 
+ The writer is indebted to Delbert Bergenstal and Raymond Edwards who 
spent many hours scoring and working out the statistical values in the experiment. 
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(1) Are there distinctive differences among students in their reasons 
for wanting to make good grades? (2) What may be the nature of 
these differences? (3) Is it possible to measure these differences on a 
quantitative scale? 


II, PROCEDURE 


The subjects in the study were five hundred eighty-five college 
students. In Table I are listed ten reasons that are frequently given 
for wanting to make good grades. The subjects were asked to evaluate 
these by means of the paired-comparison technique (each statement is 
compared to every other statement). 


TaBLe I.—SrtrupEnts’ RESPONSE TO THE QusEsTION: ‘‘WHyY Do You Want To 
Make Goop GrapsEs?”’ 
I want to make good grades— 
(1) Just to meet the requirements for a degree. 
(2) To please my family. 
(3) To win special honors and recognition. 
(4) To win out in competition with some other person or persons. 
(5) To indicate that I am actually learning something: new facts, how to 
think, ete. 
(6) To be eligible for initiation and student activities. 
(7) To secure a better future recommendation for a job. 
(8) It is a matter of little interest to me. 
(9) To uphold a reputation already gained among my associates and friends. 
(10) To gain the respect of my instructors. 


The basis for judgment was to be the personal, subjective appeal 
that each statement had for the particular student making the choice. 
The scoring and final scale value for each statement was based on the 
procedure described by Thurstone! in his study of nationality 
preferences. 

A separate analysis was made for eleven different student groups 
as well as for all the subjects combined. Table II indicates the specific 
comparisons made. The numbers in the parentheses show the number 
of subjects in each group. 


TaBLeE II.—Tue Dirrerent Suscroup ComMPARISONS 
. Men (375) vs. Women (205). 
. Freshmen (93) vs. Juniors and Seniors (337). 
. Students with a C-plus or better accumulative average (63) vs. Students with 
a C or lower average (115). (Freshmen not included.) 
4, The Juniors and Seniors in each of the following colleges: Agriculture (62) vs. 
Arts and Science (85) vs. Business (73) vs. Education (60) vs. Engineering (55). 
5. Combined results of all the five hundred eighty-five subjects. 


one 





Pe abana TOUR oe Hee aA ERAN RIS Ce eS 4 aS 





= rH A A le 


~~ — 


= 


— ———_—  -~_— fe eee eek OCZ-..lUr.rlClC e,lUellCUe 








oS 


is 


1 








Aaa Ce tnt 


ee a ao eke ne eee 


Individual Differences in Scholastic Motives 509 


The above arrangement necessitated the scoring and computing 
of scale values for twelve different sets of data; the eleven subgroups 
of Table IT and one set of data representing the combined judgments of 
all the five hundred eighty-five subjects. The first step was to make 
tables of proportions showing the number of times that any one state- 
ment was preferred to all the rest. Starting from these twelve tables the 
same series of computations was made as described by Thurstone. In 
translating the proportions into standard measurements the solution 
with case V was used. 


III, RESULTS 


A. Final Scale Values Based on the Combined Judgments of All Five 
Hundred Eighty-five Subjects.—The statement that was least preferred 
was arbitrarily given a value of .000. Each of the other statements 
was then given a value according to its distance on the discrimination 
scale from this “‘zero’’ statement. The final scale values as determined 
by all of the subjects are represented graphically in Fig. 1. 

In this graph we have an objective representation of the way a 
heterogeneous group of five hundred eight-five college students feel 
about getting good grades. Of course it is not a complete answer. 
There are several obvious limitations: The judgments are restricted 
to ten arbitrarily choosen statements; the results are subject to errors 
of sampling and test administration as well as the inaccuracies of this 
type of paper and pencil questionnaire. Nevertheless, these results 
do answer in part our original questions as to the nature of individual 
differences in motivation and the possibility of quantitatively scaling 
them. The spread of items along this motivation continuum is rather 
irregular. If we were working to the end of constructing an efficient 
measuring instrument it would be well to insert new statements in 
the large gaps between those items already on the scale and to eliminate 
some of those statements that are clustered together. (A small 
dispersion on the continuum indicates either that there is a lack of 
discrimination between the items or that they have equal weight for the 
subjects.) Such a refinement would involve no new theoretical or 
methodological questions but would necessitate the empirical evalua- 
tion of the new scale items. 

B. Results of the Eleven Subgroups.—For the purpose of this study 
the important thing is not the absolute rank order of the different 
statements but the relative ranking between the different subgroups. 
That is, how do the Men compare with the Women? the Freshmen with 
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the Juniors and Seniors? the bright students with the poor students? 
etc. We are interested primarily in the question of individual varia- 
tion in kinds of motives and whether these differences are quantita- 
tively demonstrable. The particular comparisons to be made were 
choosen arbitrarily on the basis of what seemed to be the more natural 
and meaningful relations. That is, a comparison between the Men vs. 
Women would be more meaningful than one between a particular 
class group, e.g., Freshmen, and either one of the sex groups. 








Seale Values | Items 
6.68 Seale Values Based on Total Cases N= Se5 
275 
50 7 7 Recommendation for a- Job 
225 
4.00 
78 
+ 5 Actually learning something 
-50 tT 2 Please the family 
228 
3.00 T * Competition with others 
275 
50 + 9 Uphold reputation 
225 
2.00 
.75 T 1 Just to graduate 
T 6 To be eligible 
250 
225 
+T 10 Yin respect of instructor 
1.00 
+ 3 “in honors 
75 
«50 
+25 
200 p__Have but little interest in gredes 











Figure 1 


In calculating the scale values for any particular group it is cus- 
tomary to give the item with the least preference the value of 0.00 
and to scale all the remaining statements according to their distance 
from this “zero” point, (see Fig. 1). Item No. 8, “Grades are of 
little interest to me,’’ was the least preferred of all the statements by 
every group, yet its proportion-of-preference value varies from group 
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to group, t.e., the Arts and Science students check it much more than 
do the Agriculture students. Because of this irregular base line it 
would be illigitimate to use the ‘‘zero”’ statement as a common point 
of origin and compare directly the scale values of the remaining 
statements. ‘Table III lists the scale values for each of the subgroups. 


TaB_e III.—Scaue VALUES FOR THE ELEVEN SUBGROUPS 





















































; Arts and . , : : 
Agriculture i ha Business Education Engineering 
iis Scale on Scale Item Scale icles Scale een Scale 

value value value value value 
8 0.00 8 0.00 8 0.00 8 0.00 8 0.00 

1 1.68 6 0.74 3 0.90 4 0.89 1 0.79 
6 1.75 9 0.84 10 0.93 3 0.98 6 0.87 
3 1.85 3 0.93 6 0.98 6 1.03 3 0.94 
2 1.90 1 0.95 9 1.05 1 1.09 10 1.02 
10 1.94 10 1.00 2 1.13 10 1.16 2 a2 
9 1.98 4 1.03 1 1.19 2 1.25 9 1.18 
4 2.07 2 1.15 4 1.23 9 1.31 4 1.26 
5 2.34 7 1.40 5 1.33 5 1.53 5 1.47 
7 2.45 5 1.52 7 1.59 7 1.61 7 1.55 
Women Men C-plus C-minus Freshmen Juniors and 

seniors 

7 Scale — Scale — Scale Silas Scale ae Scale an Scale 
value value value value value value 

8 | 0.00. 8 | 0.00 8 | 0.00 8 | 0.00 8 | 0.00 8 | 0.00 
3 | 1.31 10 | 0.63 1 | 1.32 3 | 0.76 10 | 0.92 6 | 0.85 
4 | 1.38 3 | 0.72 6 | 1.44 10 | 0.86 1 | 1.04 3 | 0.92 
10 | 1.42 6 | 0.77 10 | 1.50 6 | 0.95 3] 1.15 1 | 1.06 
1 | 1.47 1 | 0.84 2 | 1.76 9 | 0.99 9 | 1.24 10 | 1.12 
6 | 1.58 9 | 0.89 3 | 1.85 4; 1.09 6 | 1.29 9/ 1.16 
9 | 1.63 2) 0.95 9/ 1.91 se. 2) 1.45 4/ 1.21 
2 | 1.70 4) 1.06 4} 1.96 2) 1.23 4/ 1.48 2) 1.28 
5 | 1.88 5 | 1.27 5 | 2.12 5 | 1.32 5 | 1.57 5 | 1.55 
7 1.98 | 7 | 1.37 7 | 2.17 7 | 1.58 7 | 1.74 7 | 1.63 



































An analysis of these data will show the wide variation in absolute scale 
values between different groups, 7.e., Men compared to the Women, 
C-plus vs. the C-minus students. This difference is chiefly a function 
of the proportion of preference given statement No. 8. 
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Thurstone! suggests that comparisons between different groups 
such as we have in this experiment can be made in terms of the correla- 
tion coefficient. But this type of analysis would give only a gross index 
of the differences and similarities between the eleven subgroups. 
In comparing these student groups we are primarily interested in 
qualitative differences that may exist in motivation toward the specific 
goal used in this experiment. A direct comparison of the rank order 
of the statements as determined by each subgroup should indicate 


TaBLE IV.—RANK-ORDER OF STATEMENTS IN ELEVEN SUBGROUPS 












































Agriculture Arts and Business | Education 
science 
Rank Statements Engineering Meaning 
1 7 5 7 7 7—| Job recommendation 
2 5 7 5 5 5—| Learning something 
3 4 2 4 9 4—| Competition 
4 9 4 1 2 9—/ Reputation 
5 10 10 2 10 2—| Please family 
6 2 1 9 1 10—| Respect of instructor 
7 3 3 6 6 3—/| Win honors 
8 6 9 10 3 6—| Be eligible 
9 1 6 3 4 1—} Just to graduate 
10 8 8 8 8 8—| Little interest 
TABLE V TaBLeE VI TaBLeE VII 
P Fresh- Juniors 
Women| Men } C-plus |C-minus| en and 
seniors 
Rank Statements Meaning 
1 7 7 7 7 7 7—| Job recommendation 
2 5 5 5 5 5 5—| Learning something 
3 2 4 4 2 4 2—| Please the family 
4 9 2 9 1 2 4—| Competition 
5 6 9 3 + 6 9—| Reputation 
6 1 1 2 9 9 10—| Respect of instructor 
7 10 6 10 6 3 1—/| Just to graduate 
8 4 3 6 10 1. 3—| Win honors 
9 3 10 1 3 | 10 6—} Be eligible 
10 8 8 8 8 8 8—| Little interest 
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whether motives show any variation between different types of student 
groups. This type of comparison is presented in Tables IV, V, 
VI, and VII. Some of the more marked inversions and shifts in 
rank-order are indicated in these tables. A more detailed analysis 
can be made by a study of the scale values themselves given in Table 
III. On the basis of all these results the following qualitative inter- 
pretations can be made: 

C. Results in Terms of the Meaning of the Specific Statements 
Themselves.—(1) With the single exception of the Arts and Science 
group, the most important motive for wanting to make good grades was 
the hope for a better recommendation for a job. Generalizing from 
this one result it appears that students still consider a college education 
of value primarily for the economic benefits it may bring. 

(2) Statement No. 5 was in second position. At its face value 
this would indicate that next to getting a good job, students are really 
interested in learning how to think, in acquiring new information and 
in gaining an education in the traditional sense of the word. The fact 
that the Arts and Science students placed this item first on the list is an 
interesting inversion. If it is a valid item it indicates that these 
students are really suited to the usual cultural curriculum that the 
Arts college presents. However, it is probably true that this particular 
statement more than any other is subject to the element of students 
responding with what they ought to say rather than to their true 
motivational feelings on the matter. 

(3) Item No. 8, ‘‘Grades are of but little interest to me,’’ was 
included in the questionnaire for those students who for one reason or 
another claim to have but little interest in grades. Also, it might be 
interpreted as an index of the group feeling toward the matter of grades 
in general. The Agriculture groups gave it the lowest rating, the Arts 
students the highest value; which may mean that the Arts students are 
the least interested, of all the groups, in the matter of grades in general. 

(4) The students with below average grades are more concerned 
with such conditions as: Pleasing the family and making good enough 
grades to graduate than are the brighter students who, in turn, seem 
quite anxious to maintain the reputation for scholarship they have 
already gained. They are also more strongly motivated by the ele- 
ment of competition and by the prospects of winning special honors and 
recognition. 

(5) Compared to the Men, the Women are more anxious to please 
the family, to maintain a reputation already gained for scholarship, to 
be eligible for initiation and school activities, and to gain the respect 
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of the instructor. On the other hand the Men are more highly moti- 
vated by the element of competition for grades. On the basis of item 
No. 8, the Men seem less interested in grades as such than are the 
Women. 

(6) The problem of eligibility for initiation and student activities 
is of paramount importance to the freshmen only. The upper class- 
men are more concerned about gaining the respect of the instructor. 
Whether this result can be taken at its face value or is the result of 
training in collegiate “‘apple-polishing”’ is a matter of speculation. 

(7) Compared to the other college groups, the students in Educa- 
tion place much less emphasis on the element of competition for grades. 
The problem of pleasing the family is of greatest importance for the 
Arts and Science students. This group is also most interested in 
winning the respect of the instructor. The Business students are 
lowest in this item. 

(8) Grades simply as a means to graduation and a job were given 
highest ranking by the Business students. This attitude is consistent 
with the relatively low value given by them to the notion of grades as 
indicators of the actual amount learned and training received. 


IV. DISCUSSION 


From a survey of the results as a whole the following conditions are 
at once apparent: 

(1) There is a marked variation from group to group in the particu- 
lar hierarchy of motives for wanting to make good grades. In every 
comparison there are wide shifts in the position of several of the items. 
The motives for good grades were certainly different, for example, for 
the Men than for the Women; for the brighter students as compared 
to the poorer ones; the Business students vs. Arts and Science students, 
and so on through all of the comparisons. Each distinctive student 
group has its own distinctive scale of motives. There is every reason 
to believe that the more individualistic this type of analysis becomes 
the more specific and personal is the pattern of motives. This is the 
most important general implication of the experiment. Motives are 
individualistic. The actual scholastic performance that any student 
may display is a function of variations beth-oen the ability and on the 
motivational side of his personality. If educational theory and educa- 
tional guidance are to adapt themselves to the actual psychological 
conditions of the students, they must recognize and make adjustments 
for these individual differences in motives. 
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This experiment has made explicit only variations in motives with 
respect to the specific goal of making good grades. Motives relating to 
other college objectives are probably equally variable in nature. What 
the particular nature of this variation might be must wait for empirical 
investigation. Individual differences in motives are easy to discover 
once the investigator is looking for them. For those working in practi- 
cal and theoretical education, a critical attitude toward these condi- 
tions of motivation should certainly be dominant. 

(2) The second condition seen from a study of Tables II-VII is the 
fact that despite the wide variation in the scale position of the different 
items, some of the statements consistently maintain the same relative 
position in all of the twelve groups. This is in part a function of the 
wording of the statements themselves and in part a common attitude 
on the part of the students. We would hardly expect statement No. 8, 
‘“‘Grades are of but little interest to me,”’ to be in any place but the last. 
Statements No. 7, ‘‘ For a better recommendation for a job,”’ and No. 5, 
“They indicate that I am actually learning something . . . ,”’ main- 
tain about the same relative position throughout. Such general state- 
ments as these are heavily loaded with traditional attitudes of parents 
and students alike. 

(3) The third major conclusion to be drawn from these data is 
based on the noticeable fact of the small separation between some of 
the items. There is a good deal of overlapping in the item discrimina- 
tions. The reliability of some of the rankings would be rather low. 
The explanation of this probably lies both within the individual himself 
and in the wording of the statements. The statements may have been 
vague in their reference: double-barrelled and overlapping in meaning. 
On the other hand, most of the students have probably never tried 
critically to evaluate their motives with regards to getting good grades. 
It is a difficult mental task to discriminate between paired statements 
such as these. Emotional factors may have acted to lower the accu- 
racy of the discriminations. Motives such as are dealt with in this 
study are highly colored with feeling and emotion. When subject to 
rigorous rational analysis this affective component disappears, leaving 


& comparatively barren statement of what may have once been a true 
motivating factor. 


Vv. SUMMARY 


Certain theoretical questions relating to the motivational side of 
scholastic performance were subject to experimental analysis. Spe- 
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cifically the study was aimed at the question of individual differences 
in student motives and the possibility of measuring these differences on 
a quantitative scale. Using the psychophysical method of paired- 
comparison, ten statements referring to, ‘‘Why do you want to make 
good grades?” were evaluated by five hundred eighty-five college 
students. The following conclusions are made: 

(1) It is quite possible to subject such intangibles as motives to a 
quantitative method of scaling. The method is that described by 
Thurstone in his study of nationality preferences. There are certain 
limitations which can largely be corrected with further refinements in 
the empirical procedures. 

(2) Scale values for eleven subgroups were calculated (Men, 
Women, Freshmen, Bright students, Poor students, etc.). There is a 
good deal of variation between the different student groups with 
respect to the scale value and rank order of the statements in the ques- 
tionnaire. Inversions in rank order and marked shifts in scale value 
are characteristic of every comparison made. 

(3) Some of the statements, nevertheless, hold about the same rela- 
tive ranking among the different groups. These statements are 
heavily loaded with traditional attitudes common to students, parents, 
and the public in general. 

(4) The scale separation between some of the items is rather small. 
This is probably due to both subjective factors and objective inade- 
quacies in the statements and the questionnaire itself. 

(5) Certain interpretations and generalizations are made based on 
the meaning of the statements themselves. These results have practi- 
cal significance for the educational counselor and administrator. 

(6) The implications of the study as a whole seem to point directly 
to the responsibility of the educational theorist as well as the practical 
worker to recognize individual differences on the motivational side of 
scholastic work. It is pointed out that student motives exist as 
complex patterns which are distinctive for the individual student. 
The policy of recognizing these differences and evaluating them should 
supplement the present emphasis on individualization with respect to 
abilities and aptitudes. 
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THE RELATION OF PERSONAL HISTORY DATA TO 
COLLEGE SUCCESS* 


E. J. ASHER AND FLORENCE E. GRAY 
University of Kentucky 


The object of this study was to determine if personal history data 
have any relation to college success and, if so, to see whether such data, 
when combined with ability test scores, would materially increase the 
accuracy of prediction of college success. Specifically, the problem 
involved (1) setting up a measure of college success, (2) the validation 
of the personal history items found on the Kentucky Personal History 
Blank which is filled out by University of Kentucky freshmen at the 
time of entrance to the University, (3) obtaining a personal history 
score for the most valid items on this blank, and (4) determining the 
relation of this personal history score to college success and to the 
scores on the University of Kentucky general ability test. 

The subjects used in this investigation were two hundred students 
who entered the College of Arts and Sciences of the University of 
Kentucky in September, 1930, and September, 1931. This group 
represented a sample of the seven hundred twenty-four freshmen who 
entered the college in 1930 and 1931. Three hundred and ten of this 
original group were eliminated for one of the following reasons: 
Changed from the College of Arts and Sciences to some other college 
in the University, had not filled out the personal history blank at time 
of entrance, left college before completion of one semester, transferred 
to some other institution and back again to the University, were part- 
time or special students, had not completed their college work at the 
time this study was made because they had dropped out of school for 
one or more semesters after entering. The remaining four hundred 
fourteen students were distributed according to the number of semes- 
ters they were in college as seen in Table I. This table also shows the 
distribution of the two hundred students in the sample used in this 
study. It will be noted that the sample includes half of the students 
in each group in the original distribution except for those who had not 
earned degrees in seven, eight, nine, or ten semesters. 

The first consideration in working with this group was the establish- 
ment of a measure of college success. The common point-hour ratio 
criterion appeared to the writers to be a relatively poor criterion 





* Paper read by the senior author at the Thirty-fourth Annual Meeting of the 
Southern Society for Philosophy and Psychology, April, 1939. 
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because it fails to take into consideration the element of survival. It 
fails to distinguish between those individuals who maintain a given 
point-hour ratio for one semester from those who maintain the same 
ratio, but for two, three, or more semesters. Freeman?’ has contended 


TaBLE I.—DistTrisuTion oF Four HunprREep FourTEEN ARTS AND SCIENCES 
STUDENTS ACCORDING TO THE NuMBER OF SEMESTERS THEY WERE IN 
COLLEGE 





Number of stu- Number of 

















Number of dents in original} students in 
semesters 
group sample 
Students who had not earned 1 44 22 
degrees. 2 124 62 
3 30 15 
4 44 22 
5 16 8 
6 15 7 
7 7 0 
8 4 0 
9 1 0 
10 1 0 
PL. «4b ewelee bene sedate &.6% 286 136 
Students who had earned degrees. 7 8 4 
8 92 46 
9 20 10 
10 8 4 
Sb ae ea Cai caea se OLeae . 128 64 
I sulin onic beim etnies - 414 200 














that this element of survival alone is a more valid criterion of college 
success than marks. It was felt, however, that a criterion which 
incorporated a measure of the level or caliber of work (point-hour ratio) 
and some measure of survival (length of time the level of work was 
maintained) would be a more adequate criterion than one based upon 
survival alone or marks alone. To combine a measure of survival with 
scholarship standing (point-hour ratio) proved to be a rather difficult 
task. What was wanted was a single criterion score which would 
include the common point-hour ratio and at the same time some index 
of survival. Such a criterion score, it was reasoned, should be highest 
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for the people who maintained a given standing in their college work for 
a period of time sufficient to graduate, and lowest for those who main- 
tained a similar level of work, but for the shortest period of time. It 


was discovered that such a criterion could be obtained by the following 
formula: 


College success 
_ Total number of grade points 
~ Total number of hours carried 





X Number of semesters in college 


This formula proved satisfactory except in those cases where 
students required more than or less than eight semesters for graduation. 
To get around this difficulty the maximum number of semesters 
allowed in the criterion score was 8. If a student finished the work for 
his degree in seven semesters, he was given a “‘bonus”’ of one semester 
in order to raise his criterion score to the level it would have been had 
he continued in college for another semester. It was felt that such a 
student should not be penalized for being superior enough to finish 
work for a degree in seven semesters. If a student required more than 
eight semesters to complete his work for a degree, he was penalized 
double the number of extra semesters required. For example, if a 
student required ten semesters to earn a degree, his point-hour ratio 
was multiplied by six semesters which is four less than called for in the 
formula. Reference to Table I will show that these adjustments 
involved only eighteen students—four who finished work for a degree 
in seven semesters, ten who required nine semesters, and four who 
required ten semesters. 

Using the above formula with the adjustments just described, 
criterion scores were obtained for each of the two hundred subjects. 
The group was arranged from high to low according to criterion scores, 
and divided into five equal groups. All of the items of the Kentucky 
Personal History Blank which appear in Table II, except items 1, 21, 
23, 25, 26, 29, and 30, were then validated against these criterion scores 
by means of the Adkins-Toops! item validation technique. 

Item 1 was correlated with the criterion by means of the correlation 
ratio. Items 21, 23, 25, 26, 29 and 30 did not lend themselves to a 
correlation item analysis. The tabulations of these items against the 
criterion scores were scrutinized to see if there were any differentiation 
between the criterion groups. No differentiation was apparent, so the 
items were dropped from further consideration. The coefficients of 
correlation for each item and in some instances for each answer to each 
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520 The Journal of Educational Psychology 
TasBLe II.—Perrsonat History Items, [rem CoEFFICIENTS, AND Scorinc WEIGHTS 
Item : ; 
saiies Items Answers}  7’s Weights 
1 “Date of birth.” Tabulated by age with-| 15-19 | —.270 3 
in six months. 20 plus | ...... 0 
2 “‘Father born in what state.”’ Tabulated | Yes — .065 0 
Yes or No as to born in Kentucky. ee * Ml eatiees 1 
3 ‘Mother born in what state.”” Tabulated | Yes — .067 0 
Yes or No as to born in Kentucky. a 2. rae eas 1 
4 “Born in what state.”” Tabulated Yes or| Yes — .215 0 
No as to born in Kentucky. Wl Codie s 2 
5 “Have you ever earned any money of your | Yes — .216 0 
mi own?’”’ ee eels 2 
; 
| 6 “Describe in detail your most important | Any — .065 
; responsibility or opportunity for leader- 
A ship in connection with your employment 
: experience.” Tabulated Any or None. 
t 7 ‘What offices, prizes, honors, responsibili- | Any — .096 1 
3 ties or opportunities for leadership have| None | ...... 0 
: you had in organizations in school, church, 
or outside (athletic, literary, musical, reli- 
: gious, military or any other)?”’ 
8 “In how many different towns have you | One .026 
attended high school?’”’ Tabulated One| More 
or More than one. 
9 “What members of your family have at-| Both . 276 2 
tended college?’”’ Father, Mother. | One — .023 1 
Tabulated Both, One, Neither. Neither | —.191 0 
‘ 10 “Underline any who have attended the} Both .075 1 
4 University of Kentucky.” Tabulated | One .076 1 
Both mother and father or Either mother or | Neither | — .195 0 
father, or Neither. 
11 “Check the item of expense during this | None . 227 3 
school year for which you are to pay with | Some — .121 0 
money earned vyourself.’”’ Tabulated | All — .132 0 





None or Some or All. 
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TaBLe II.—Continued 
Item : , 
sounie Items Answers| r’s_ | Weights 

12 “‘Have you borrowed money except from | Yes .036 
parents for this year’s expenses?”’ No 

13 “Do you expect to?”’ Yes for either 12 or | Yes — .187 
13 given a zero weight; No for both given | No 
weight of 1. 

14 “Ts it necessary that you have employ-| Yes — .192 0 
ment during this college year in orderto| No | ...... 2 
remain in school?’’ 

15 “Check the amount of time you plan to} None 171 3 
give to outside employment during the | Less — .065 0 
college year.’”’ Tabulated None, Ten| More — .144 0 
hours or less, or More than ten hours. 

16 “Have you secured such employment?’ | Yes —.114 
Tabulated Yes or A part of it or No. Part .025 

No — .070 

17 Are you married? Yes — .050 

No 

18 “Is any one beside yourself dependent on | Yes — .050 
you for support?”’ No 

19 “Are you a church member?” Yes .037 

No 

20 “Did you skip any of the school grades of | Yes . 156 2 
half grades?”’ No 

21 “If you did, which were they?” Tabu- 
lated Elementary, Junior High, Senior 
High. 

22 “Did you repeat any of the school grades | Yes — .206 0 
or half grades?’”’ a ee 2 

23 “If you did, which were they?’”’ Tabu- 
lated Elementary, Junior High, Senior 
High. 
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TaBLE II.—Continued 





Item 
number 


Items 


Answers 


Weights 





“In what high-school subjects did you re- 


One or 





ceive a mark of failure or condition?’’ | More — .193 0 
Tabulated as One or more or None. None 





25 “‘ After starting the first elementary grade, 
how many half years were you out of 
school before you graduated from high 
school?” Tabulated by actual amount 
of time listed. 





26 “How many weeks did you lose on account 
of illness last year?” Tabulated by 
actual amount of time listed. 





27 “What part of your expenses do you expect | None .063 
to earn after graduation?’’ Tabulated | Part .034 
None or Part or All. All — .007 





28 “Do you feel confident that No. 1 (voca- | Yes — .014 
tion listed in a previous item) is what you | No 
expect to do?”’ 





29 “How long have you contemplated No. 
1?” Tabulated by actual amount of 
time listed. 





30 “Father’s occupation.” Tabulated as 
Professional, Semi-professional and busi- 
ness, Skilled labor, Semi-skilled to slightly 
skilled labor, Common labor, Retired or 
dead. 

















item are shown in Table II. The significance of these items cannot be 
judged entirely from the size of the coefficients. It is rare to find a 
single item which yields a coefficient by the Adkins-Toops formula of 
much more than .40 with a criterion. A glance at item 9 reveals a 
positive coefficient of .276 for both parents attending college and a 
negative .191 for neither parent attending college. Here we have 
a differential between both attending and neither attending of .457. 
The coefficients for items 1, 2,.3, 4, 5, 7, 9, 10, 11, 13, 14, 15, 20, 22, 
and 24 were judged to be indicative of sufficient discrimination to be 
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given numerical weights for the purpose of scoring the Personal History 
Blank. These weights are given in the last column of Table II. The 
weights were assigned roughly on the basis of the amount of discrimina- 
tion between the criterion groups as revealed by the coefficients of 
correlation. Thus item 1 with a coefficient of —.27 was given a weight 
of 3 for ages fifteen to nineteen and a zero weight for ages of twenty and 
over. By actual trial it was found that this dichtomous scoring of item 
1 yielded optimal results. Item 4 with a coefficient of —.215 for a 
‘“‘yves”’ answer indicates that a student born in Kentucky is a bit less 
likely to be successful in college than one born out of the State. The 
‘“‘ves’’ answer to this item, therefore, was given a zero weight and the 
“no” answer a weight of two. 

A summary of these items reveals that the student at the University 
who is most likely to be successful as measured by the criterion is a 
student under twenty years of age, not born in Kentucky, whose 
parents were not born in Kentucky, whose parents attended college, 
who hasn’t earned any money of his own, who dvesn’t have to work his 
way through college or borrow money to get through, who has held 
important or responsible offices before coming to college, who has 
earned prizes, honors, etc., who has skipped one or more half grades in 
school, and who didn’t fail in any school subjects in elementary or high 
school. 

Using the weights shown in Table II, the fifteen items mentioned 
above were scored to yield a Personal History score. These personal 
history scores ranged for the two hundred subjects from 2 to 27 out of a 
possible 28, with a mean of 15.15 and a standard deviation of 5.74. 
These personal history scores were then correlated with the point-hour 
ratio-survival criterion and with the general ability test scores earned 
by the subjects at time of entrance to college. The object in calculat- 
ing these correlations was to see whether the personal history score 
when combined with the general ability scores would yield a more valid 
prediction of college success. 

The intercorrelations of the personal history scores, general ability 
test scores, and the criterion scores are given in Table III, together 
with the multiple correlation coefficient between the criterion and the 
personal history and general ability scores. It is significant to note 
that the personal history scores yield a positive coefficient of .398 with 
the criterion and this in spite of the fact that there are only fifteen 
items involved in the personal history score. If one could find fifteen 
additional items of equal validity, it is very likely that this coefficient 
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would be increased. The most significant finding is the zero correla- 
tion between the personal history and general ability scores, and the 
multiple correlation between these two scores and the criterion. It is 
obvious that the personal history score is getting at some factor or 
factors involved in college success that are not tapped by the general 
ability test. Studies of personality tests, interest blanks, and other 
kinds of ability tests have failed for the most part to show as low a 
correlation with intelligence tests or, if they did, to show any 
appreciable correlation with college success. 


Tas_e III.—CorRELATION COEFFICIENTS BETWEEN PERSONAL History Scorgs, 
GENERAL Apitity Test Scores, AND ORIGINAL CRITERION ScoRES 








r PE 
Criterion scores with general ability.....................0.. . 362 .041 
Criterion scores with personal history....................5. . 398 .040 
Personal history and general ability.....................0.. .067 .047 
Multiple coefficient between criterion and personal history and 
iced tel ia tiviemahae Sule-einens wae eemen .521 











Since the coefficient of .362 between the criterion and the general 
ability test scores is much less than the coefficient generally obtained 
between the test scores and point-hour ratios for first-year students, it 
appeared that the element of survival in the point-hour ratio-survival 
criterion was responsible for the decrease in correlation. To check this 
hypothesis, point-hour ratios were computed for the two hundred 
subjects. The intercorrelations of the personal history scores, general 
ability test scores, and these point-hour ratios were computed together 
with the multiple correlation between the two scores and point-hour 
ratios as the new criterion. These correlations are shown in Table IV. 


TaBLeE IV.—CorRRELATION COEFFICIENTS BETWEEN PERSONAL History ScoRgss, 
GENERAL ABILITY Test SCORES, AND QUALITY-QUANTITY RaTIO CRITERION 








r PE 
Criterion scores and general ability.................050005- .517 .034 
Criterion scores and personal history...............0+ee000: .309 .043 
Personal history and general ability..................00085- .067 .047 
Multiple coefficient between criterion and personal history and 
IER Pa T ea PRS Ee Re RE re .585 











The correlation of the general ability scores with point-hour ratios 
is .517. This is much nearer the coefficient which is generally obtained 
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\ 

and quite a bit larger than the .362 found with the point-hour ratio- 
survival criterion. It is rather clear that the general ability test is a 
better indication of the caliber of school work than it is of the length of 
time that such a level of work will be maintained. The personal 
history score, on the other hand, is a somewhat better indicator of 
survival than it is of the caliber of work. The personal history score 
correlates .398 with the point-hour ratio-survival criterion, and .309 
with the point-hour ratio criterion. The coefficient of .398 is slightly 
larger than the .362 found between the general ability test and the 
point-hour ratio-survival criterion. 

It is significant to note that the combined personal history-general 
ability score (multiple coefficient) is more closely related to both cri- 
teria than either the personal history or the general ability test score 
alone. The relative importance of the two scores in these multiple 
coefficients is seen in the beta coefficients. The beta coefficients for 
the survival criterion are .375 for the personal history scores, and .357 
for the general ability test scores. For the point-hour ratio criterion 
the betas are .276 for personal history scores and .489 for general 
ability test scores. It is clear that the personal history score, although 
obtained from only fifteen personal history items, increases the accu- 
racy of prediction of college success regardless of which criterion of 
college success is used. This result points to the possible fruitfulness 
of a further investigation of personal history data in relation to college 
or school success. 


SUMMARY 


(1) The items of the Kentucky Personal History Blank were 
validated against a criterion of college success which included, in 
addition to the common point-hour ratio, the element of survival. 

(2) Fifteen of the most valid items were weighted roughly accord- 
ing to their correlation with the criterion, and scored to obtain a 
‘personal history score.”’ 

(3) The personal history score was correlated with the common 
point-hour ratio criterion of college success and with the survival 
criterion. The resulting coefficients were .309 and .398, respectively. 

(4) The correlations of an intelligence test with these criteria were 
.517 and .362, respectively. 

(5) The multiple correlation coefficients for the personal history 


scores and intelligence test scores against the two criteria were .521 
and .585. 


+. 
4 
a 











526 The Journal of Educational Psychology 


(6) The coefficient between the personal history scores and intelli- 
gence test scores was .067. 

(7) The personal history score is more closely related to the point- 
hour ratio-survival criterion than intelligence scores are, while the 
intelligence test scores are more closely related to the point-hour ratio 
criterion. 

(8) The personal history score when combined with intelligence 
test scores increases the accuracy of prediction of either criterion of 
college success. 


REFERENCES 


1. Adkins, Dorothy C. and Toops, Herbert A.: “Simplified Formulas for Item 
Selection and Construction.”’ Psychometrika, Vol. 1, 1937, pp. 165-171. 

2. Constance, Clifford L.: Predicting Academic Achievement at the University of 
Oregon. Bull. American Association of Collegiate Registrars, Vol. 1, 1935, 
pp. 18-29. 

3. Freeman, Frank S.: ‘‘ Predicting Academic Survival.” Jr. Ed. Research, Vol. 
xxi, 1931, pp. 113-123. 

4. Garrett, Henry E.: Statistics in Psychology and Education. Longmans, Green 
and Company, New York, 1937. 


ED a a tae ena ae ee 





mon 


Swss 


ti 





ce 


AEE AP PREETI EC! 


SPOBGPMDP IRE PELLET S88 STINE NIE: ENGI ee BIE 


z 
& 
b 


CONSIDERATIONS RELATIVE TO THE SELECTION OF 
AN INDEX OF INTELLIGENCE! 


R. L. JENKINS 
New York State Training School for Boys, Warwick, New York 


This study was undertaken for the purpose of comparing the 
stability of the IQ and the PC (Heinis personal constant)? on the body 
of material which has been assembled at the Institute for Juvenile 
Research. 

The study is confined to Binet test ratings. In an overwhelming 
majority of the instances the test used was the Stanford Revision. 
When the basal age was below three years, a combination of Stanford- 
Binet and Kuhlmann-Binet was used—the Kuhlmann-Binet below 
three year mental age and the Stanford Revision at and above the three 
year level. A few Kuhlmann-Binets and Herring-Binets were included. 

A canvass was made of case records at the Institute and there were 
selected for use, from the last fifteen thousand cases, all cases in which 
two or more Binet tests had been given at intervals of three months or 
more. All diagnosed or suspected cases of neurological or endocrine 
disease were excluded. There remained thirteen hundred fifty-three 
cases. To these were added four hundred twenty-one cases examined 
at the National College of Education and made available through the 
courtesy of Dr. Louise Farwell. The Institute material is heavily 
loaded with retarded children, and the material of the National 
College of Education, being derived preponderantly from the examina- 
tion of young children of superior intelligence, formed a valuable 
counter-balance. 

All adjacent tests were compared. If the child had been tested but 
twice the two tests permitted but one comparison. If the child had 
been tested three times there were two possible comparisons between 
adjacent tests, a comparison of the first test and second and a compari- 
son of the second and third. If the child had had five tests there were 
four possible comparisons. 





1 In preparing this paper the author is indebted to Dr. Paul Schroeder, Director 
Institute for Juvenile Research, for permission and opportunity to organize this 
study and to the Civil Works Service and Works Progress Administration for 
clerical assistance in assembling the material. 

* Heinis, H.: ‘‘A Personal Constant.” Journal of Educational Psychology, 
Vol. xvi, 1924, p. 97. 
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1Q’s were calculated on the basis of an adult mental age of fifteen 
years, rather than fourteen years. Previous studies at the Institute 
had determined that the IQ was more stable if one assumed an adult 
mental age of fifteen years than if one assumed an adult mental age of 
fourteen years or of sixteen years. This fact has been recognized by 
Terman in the employment of fifteen years as an adult mental age with 
the New Stanford Revision. The revised method of calculating the 
IQ of individuals thirteen to sixteen, introduced by Terman, was not 
used in this study and, consequently, cannot be evaluated from this 
study. 

The comparisons were sorted according to the chronological and 
mental age of the children at the time of the first test. The number of 
comparisons by chronological and mental age is listed in Table I. 
There were, for example, a series of fifty-one comparisons in which the 
children were of chronological age six years, (6-0 to 6-11), and mental 
age five years (5-0 to 5-11) at the time of the first test. For this group 
a distribution of the difference between the IQ’s of the tests (IQ Test 2 
minus IQ Test 1) was prepared. The value of the median of this dis- 
tribution was 0. The median value was obtained for each similar dis- 
tribution. All such medians are entered in Table II. Those medians 
within the heavy lines are based on fifty or more comparisons. Those 
in squares are based on twenty-five or more comparisons. Those 
within the dotted line are based on thirteen or more comparisons. 

A similar treatment of the PC’s computed from these tests is con- 
tained in Table III. It is noteworthy that both the IQ and the PC of 
three-year-old children in our series tend to rise a few points. This 
tendency is apparent in children who are neither retarded nor advanced. 
It indicates that, viewed in the light of the mental growth of these 
children, the standardization of the scale at these lower levels is faulty, 
i.e., the test at these ages is more difficult than at the higher levels. 
This finding may be influenced by the fact that a bastard scale was 
used, Kuhlman below three years, Stanford above. 

A group of twelve comparisons of adults ranging in age from 
eighteen to thirty years yielded a median increase of one point IQ and 
one point PC. 

There appears to be a slight tendency for superior IQ’s and PC’s to 
decline. This is not necessarily to be interpreted as evidence that the 
assumptions underlying the IQ and PC are faulty. Since the correla- 
tion between test and retest is imperfect, such a regression of superior 
and inferior ratings toward the mean on retest is natural. To 
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make adjustments so that it would not occur would be to introduce 
distortion. 

Tables II and III are not directly comparable since they are not in 
comparable units. Tables IV and V have, therefore, been prepared. 
Table IV is Table II translated into mental age unit equivalents of the 
IQ changes. These are the mental age equivalents at the age at the 
first examination. They indicate, therefore, how different the median 
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mental age at first test was from that which would have been necessary 
in order to reduce the median change of IQ to 0. Table V contains a 
similar translation of PC ages into mental age units. 

Careful comparison of Table IV and V by various methods! indi- 
cates what the reader may recognize by inspection, that: 

(1) The IQ appears to be slightly superior to the PC for the predic- 
tion of mental growth of the younger children (up to eight years of age 
at first test in this material). 





1 Mean deviation regardless of sign by columns, weighted mean deviation 
regardless of sign by columns, number of instances in which deviation in Table IV 
exceeds that in Table V. 
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(2) The PC appears to offer no advantage over the IQ for the 
children of the middle-age group (nine to thirteen years of age at first 
test in this material). 

(3) The IQ (unrevised) appears decidedly inferior to the PC for 
predicting the mental growth of the adolescent group (fourteen to 
seventeen years of age at first test in this material). The outstanding 
tendency in this age group, is toward a rise in IQ. A little considera- 
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Fig. 2. 
tion should lead to the understanding that this tendency would be even 
more pronounced had the adult mental age of fourteen originally 
advised by Terman been used. However, it should be recognized that 
our adolescent group is largely a dull group with inadequate representa- 
tion of average and superior intelligences. 

(4) There is indication from an inadequate number of cases that the 
PC’s of very bright children (who are numerically poorly represented 
in this material) tend to drop sharply, and that the PC may be a poor 
prediction device for this group. 

We may at this point examine the rationale underlying the assump- 
tion of a constant IQ and a constant PC. 
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The assumption that the IQ remains constant involves the assump- 
tion that the mental growth curve is of the type depicted in Fig. 1, 7.e., 
mental growth proceeds at a constant rate from birth until the adult 
mental age at fourteen, fifteen, or sixteen years and then suddenly 
stops. The slopes of the lines in Fig. 1 are the IQ’s of three theoretical 
individuals whose mental growth curves are illustrated. 

Underlying the Heinis personal constant is the assumption that the 
mental growth curve is of the type illustrated in Fig. 2. This is a 
growth of negative acceleration which is more reasonable than that 
illustrated in Fig. 1 in that it avoids the angulation when the individual 
reaches the adult level at fourteen or sixteen. As in Fig. 1 the growth 
curves for dull, average and superior individuals are different only in 
their vertical dimension. If we let di/dt equal the rate of mental 


growth of an average child, then the rate of mental growth of the 


superior child is K o where K > 1; and the rate of growth of the child 


of inferior intelligence is K o in which K <1. The PC is the ratio 
of the mental growth units for the child’s mental age to the mental 
growth units for his chronological age. For the PC to remain constant 
it is necessary that (1) the Heinis growth curve of the average, inferior 
and superior children be a satisfactory description of growth of 
‘“‘oeneral intelligence.”’ This necessitates (2) that the dispersion in 
intelligence should be strictly proportional to the mental age, an 
assumption also implicit in the IQ. 

Both of these curves have one degree of freedom. We may consider 
their relation to another device for representing intelligence which 
preceded the publication by Heinis of his mental growth curve and his 
advocacy of the PC as an index of intelligence. The method referred 
to is the expression of mental test performance in terms of the sigma 
value of test score for the chronological age. 

If we consider the bases underlying an assumption that the sigma 
value of the mental test performance of a child will remain constant we 
will find them less restrictive than the bases underlying the constancy 





1 The sigma value of the test performance of a six-year-old child is the deviation 
of his test performance from the mean of the performances of six-year-old children, 
expressed in terms of the standard deviation of the distribution of six-year-old 
children. Such a sigma value is in itself an index of intelligence which may be 
expected to remain fairly constant and which may be translated into terms equiva- 
lent to a mental age if this be desired. 
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of the IQ and the constancy of the PC. If the IQ is constant, the 
sigma value of the mental test performance will necessarily be constant. 
If the PC is constant the sigma value of the mental test performance 
will necessarily remain constant. On the other hand, if the sigma 
value is constant it does not follow that either the IQ or the PC is 
constant. The only assumption underlying the use of sigma value of 
mental test performance as an index for prediction of mental growth is 
the assumption that the relative status of children with respect to 
intelligence remains constant. The assumption is that the rank order 
of the children on the second test will resemble that on the first. This 
assumption is implicit in the use of any index of intelligence for predic- 
tive purposes. If the sigma value is determined in a manner to be 
described, its value as an index for prediction does not depend even 
upon the assumption of a normal distribution of intelligence. 

It is true that if the distribution of intelligence itself (as contrasted 
with the distribution of test scores) is skewed, then sigma units so 
determined will not be equal. For instance, the difference in intelli- 
gence between the measures +1.00 and +2.0c might not be quantita- 
tively the same as the difference between +2.0c0 and +3.0c. Thisisa 
matter of more academic than practical importance. The use of 
sigma values would avoid a field of controversy which appears non- 
sensical to a layman and which has resulted in some popular disrespect 
for mental testing, viz., the controversy over the adult mental age. It 
furthermore makes it possible to avoid the creation of fictitious mental 
ages above the normal adult level such as are involved in the Binet 
tests. 

In the standardization of test scores it would doubtless be safest to 
plot the sigma values of various test scores from the number of indi- 
viduals scoring above and below each point. Referring to the standard 
nomenclature of the normal distribution, x would be determined from 
p and gq. Such a procedure would take care of any skewness of the 
distribution resulting from inequalities in the spacing of test items. 

One additional possibility might be considered. If those children 
who are less intelligent develop not only less rapidly than the average 
child, but also cease their development at an earlier age, or at a later 
age, t.e., if bright, average and dull individuals do not reach their 
mental maturity at the same age, then neither IQ, PC, nor any other 
index based upon a growth curve with but one degree of freedom can 
remain constant. It would be necessary in dealing with growth curves 
to distinguish the curves of bright, average, and dull individuals in two 
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dimensions. In undertaking this investigation the author considered 
attempting this task should the material justify it. It is plain that 
there are no deviations in Tables II to V sufficiently systematic to 
justify such an undertaking on this material. Such a difference in 
mental growth curves in two dimensions would not, however, interfere 
with the use of sigma values for prediction. No error would be 
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introduced if sigma values were determined as has been suggested. If 
such differences in two dimensions exist, it is difficult to conceive that 
intelligence can be normally distributed both in adults and in children 
of any given age group. This question need not trouble us, for the 
usefulness of sigma values for prediction does not depend upon the 
assumption of a normal distribution of intelligence. 

The question may be asked—why not use percentile ranks rather 
than sigma value? It may be pointed out that the difference in intelli- 
gence between the ninety-eighth and the ninety-ninth percentile is 
many times greater than the difference between the fiftieth and the 
fifty-first percentile. The deviation of intelligence from a normal 
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distribution can certainly not be great enough to produce a comparable 
degree of distortion. 

A further advantage of the use of sigma values is provided by the 
probability that the growth curves of the functions involved in various 
types of tests are not identical. It becomes necessary, therefore, in 
using an index dependent upon the assumption of a given type of 
growth curve, to ignore these differences or to determine the shape of 
the growth curve for each test. These difficulties are obviated by the 
use of the sigma value of test score. 

This problem was set up and this discussion was prepared prior to 
the publication of the revised Stanford Scales. Other matter inter- 
fered with its completion and publication. It is noteworthy that 
Terman has recognized the inadequacy of the growth curve, depicted 
in Fig. 1, by a revised method of calculating the intelligence quotient, 
which method assumes a growth curve of the type depicted in Fig. 3. 
This is obviously a compromise which seeks to retain the much- 
publicized IQ which the community has come to accept, but to modify 
the barbarous growth curve which underlies the IQ concept by forcing 
it in the direction of a curve of negative acceleration. By substituting 
two angles for one, the distortion involved is doubtless reduced, but a 
glance at the growth curve should satisfy anyone that it is a makeshift 
adaptation. While the actual error involved in its practical use may 
not be great, it is not a satisfactory theoretical formulation. 


SUMMARY AND CONCLUSIONS 


(1) The IQ appears to be slightly superior to the PC for the predic- 
tion of mental growth of the younger children (up to eight years of age 
at first test in this material). 

(2) The PC appears to offer no advantage over the IQ for the 
children of the middle-age group (nine to thirteen years of age at first 
test in this material). 

(3) The IQ (unrevised) appears decidedly inferior to the PC for 
predicting the mental growth of the adolescent group (fourteen to 
seventeen years of age at first test in this material). The outstanding 
tendency in this age group is toward ariseinIQ. A little consideration 
should lead to the understanding that this tendency would be even 
more pronounced had the adult mental age of fourteen originally 
advised by Terman been used. However, it should be recognized that 
our adolescent group is largely a dull group with inadequate repre- 
sentation of average and superior intelligences. 
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(4) There is indication from an inadequate number of cases that the 
PC’s of very bright children (who are numerically poorly represented 
in this material) tend to drop sharply, and that the PC may be a poor 
prediction device for this group. 

Terman’s modification of the method of calculating the IQ is a 
makeshift modification of the assumptions underlying the IQ, forcing 
the growth curve in the direction of a curve of negative acceleration. 
While this modification, rather than substitution of another index, may 
have been strategically justified, it does not provide a satisfying 
theoretical formulation. 

Scientifically, little would be lost and many difficulties would be 
avoided by the substitution for the IQ of the sigma value of the test 
score for the subject’s age group. 
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THE RELATIVE POTENCY OF THE NURSERY SCHOOL 
AND THE ee «cme IN BOOSTING 
1 


FLORENCE L. GOODENOUGH AND KATHARINE M. MAURER 
Institute of Child Welfare, University of Minnesota 


In 1925 Woolley published an article purporting to show that 
attendance at the Merrill-Palmer Nursery School brought about very 
marked changes in the intelligence test scores of young children. 
These changes were not paralleled by the results of retests given 
to other children of comparable age who were on the waiting list. 
Although the study was based upon a relatively small number of cases, 
and such factors as differential acquaintance with the examiners or 
with materials and tasks similar to those used in the tests were not 
controlled, it, nevertheless, aroused much interest among those con- 
cerned with problems of mental growth. However, the results of later 
studies from other institutions yielded inconsistent results. A few 
seemed to lend more or less support to the Merrill-Palmer findings; 
others failed to show any measurable effect of nursery-school experi- 
ence. Whether or not the differences could be attributed to the par- 
ticular tests used for measurement—in some studies the 1916 Stanford- 
Binet was used, in others the 1922 Kuhlmann-Binet, and in still others 
the Merrill-Palmer Performance Scale—to special conditions of testing 
such as better rapport on the part of children who had attended nursery 
school or bias on the part of examiners who knew to which group the 
individual children belonged, or to differences in the educational 
regimes of the various schools could not be determined. 

In 1932 Wellman published the first of a long series of articles from 
the Child Welfare Research Station of the State University of Iowa 
that have been appearing in both scientific and popular journals since 
that date. These articles deal with the effect of environmental changes 
upon child intelligence. The environmental factors considered are of 
three general types: School environment, home environment, and 
institutional environment. Under school environment are included 
the nursery school (or preschool as it is called at Iowa), the elementary 
school and the high school. Home environment has been studied 





* Assistants in the preparation of these materials were furnished by the per- 
sonnel of the Work Projects Administration, official project #665-71-3-69, sub- 
project #259. 
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chiefly in terms of the development of children placed in foster homes. 
Two kinds of institutional environment have been studied. The first, 
which consisted of residence in an orphanage at Davenport, Iowa, was 
said to be very non-stimulating in its effect upon child intelligence, 
while the second, residence in an institution for the feebleminded under 
certain described conditions, was judged to be stimulating in its effect. 

In this paper, no attempt will be made to criticize these studies in 
detail. Such criticism has already appeared elsewhere. This study is 
merely a concrete illustration of the misleading conclusions that have 
resulted from a statistical practice that was begun in Wellman’s 1932 
study and which the Iowa authors continue to employ in practically all 
their investigations in spite of the fact that its mathematical indefensi- 
bility has been repeatedly pointed out. The procedure consists of 
classifying subjects on the basis of the intelligence quotients earned on 
the first test given and computing the mean change in intelligence 
quotient from initial to final testing for each of these groups separately. 
It is obvious that when this is done, statistical regression due to errors 
of measurement renders it mathematically certain that unless other 
factors are operating to obscure the results, the cases originally testing 
high will appear to lose and those originally testing low will appear to 
gain, since, owing to the fallibility of the measuring instrument, chance 
as well as true ability will play a part in determining the original 
grouping. When the chance errors are reassorted at the time of the 
second test, each group will ‘‘regress’”’ toward its own true mean with 
the result that those initially at the upper extremes, whose position was 
determined in part by real ability and in part by good luck, will appear 
to lose while those who, for analagous reasons, were initially at the 
lower extreme will appear to gain. The amount of this regressive gain 
or loss will be the algebraic mean of the chance error for each group. 
Because the element of chance plays a much greater part in the mental 
test scores of young children than of older ones, the magnitude of the 
regressive shift at the early ages will be correspondingly large. If, 
moreover, as frequently happens in the case of young children, there is 
a general tendency toward better rapport at the time of the final than 
at the time of the initial test, with the result that the final mean of the 
entire group is shifted upward, the regressive ‘‘losses” of the upper 
group may be largely or wholly masked. Their IQ’s will then show 
little change while the ‘“‘gain”’ of the lower groups will be much 
increased, since the regressive shift is always toward the mean of the 
second measurement. 
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The Iowa claims for the effect of environmental stimulation rest to a 
very great extent upon the statistical fallacy just described. It is true 
that in some of their studies a small but measurable change in the 
general mean in addition to the large regressive shift has appeared. 
Unfortunately, however, particularly in the articles that have appeared 
in popular journals and newspapers and in the condensed accounts of 
their experiments that have been published in the scientific and educa- 
tional journals, the figures quoted have not been those derived from the 
population as a whole, but the highly misleading figures obtained from 
one or the other extreme. If the article in question deals with the 
stimulating effect of a ‘“‘good’” environment, the “gains” of the 
initially low group are quoted; if it deals with the depressing effect of a 
‘“‘poor”’ environment, the losses at the upper levels are stressed. 

The reason given by the Iowa authors for continuing to follow this 
practice is their belief that the factor of statistical regression is at least 
not the main explanation for the tendency of the children initially 
testing low to gain and those initially testing high to lose. It must be 
agreed that their explanation has a certain amount of surface plausi- 
bility. Briefly expressed, the theory is that the direction and extent 
of the gain or loss is determined, not by the absolute character of the 
environment but by the contrast between a new environment and that 
to which an individual child had previously been subjected. However, 
this hypothesis would carry more weight had the authors seen fit to 
ascertain how much of the observed change in IQ could fairly be 
attributed to regression alone. If the change proved to be greater 
than this amount, then and only then would it be necessary to evolve a 
further hypothesis in order to explain the residual. Although the 
statistical procedure for making this computation is well known, it has 
not thus far been used in any of the studies from the Iowa laboratory. 
The authors continue to quote the uncorrected figures from whichever 
end of the distribution fits the current line of discussion. 

It has occurred to us that a re-computation of our own data on the 
effect of nursery-school training as recently published in the Thirty- 
Ninth Yearbook of the National Society for the Study of Education might 
serve as a concrete example of the fallacy involved. 

Tables I and II reproduced from the Yearbook article show that 
after periods of one and two years’ attendance at the University of 
Minnesota nursery school, the intelligence quotients of children on the 
Minnesota Preschool Test show, on the average, a measurable increase, 
but that a similar increase also occurs in children of comparable age 
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TaBLE I.—CHANGEs IN MEAN IQ on THE MINNESOTA PRESCHOOL TEST AFTER 
One YEAR OF NURSERY-SCHOOL TRAINING COMPARED WITH CHANGES FOR 
NON-NURSERY ScHOOL CHILDREN AFTER A YEAR’S INTERVAL! 














Nursery Non-nursery 
Occupational Age at test 1 Age at test 1 
group 
0-29 | 30-59 | Total | 0-29 | 30-59 | Total 
ree oo No. Cases 5 26 31 23 32 


9 

Test 1 108.5 | 114.4 | 118.5 | 108.6 | 114.2 | 112.7 
Test ‘2 117.5 | 120.2 | 119.7 | 117.5 | 119.5 | 118.9 
re No. Cases 6 22 28 22 42 64 
Test 1 109.2 | 111.4 | 110.9 | 103.2 | 107.6 | 106.1 
Test 2 115.8 | 113.4 | 118.6 | 105.5 | 111.9 | 109.7 
Bie. Ue Paksosdean No. Cases; 15 10 25 9 17 26 
Test 1 | 108.5 | 106.5 | 107.7 | 108.1 | 103.7 | 105.2 
Test 2 113.8 | 110.0 | 112.3 | 109.2 | 110.7 | 110.2 
, RE No. Cases| 26 58 84 40 82 122 
Test 1 108.8 | 111.9 | 110.9 | 105.5 | 108.4 | 107.6 
Test 2 115.0 | 115.7 | 115.5 | 109.3 | 113.8 | 112.2 
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1 Taken by permission from the Thirty-ninth Yearbook of the National Society 
for the Study of Education, Intelligence: Its Nature and Nurture. 


TaBLE I].—CHaNncEs In MpAN IQ ON THE MINNESOTA PRESCHOOL TEST AFTER ; 
Two YEARS oF NURSERY-SCHOOL TRAINING COMPARED WITH CHANGES IN 
NoN-NURSERY ScHOOL CHILDREN AFTER AN INTERVAL OF Two YEARs! 








Occupational group Nursery | Non-nursery 

RIPE Ci ety 0 Yo a DOPED p eh etapa el, No. Cases 15 9 
Test 1 110.8 114.2 
Test 3 118.5 124.2 

iss « in e:k Rees Wet atte encaie wel No. Cases 19 15 
Test 1 108.0 107.5 
Test 3 114.9 112.5 

TE nko nous vsma beak Kes seen No. Cases 17 5 
Test 1 106.0 118.5 
Test 3 112.5 112.5 

I niticin s/h iad aceite bate aici kaa No. Cases 51 29 
Test 1 108.9 111.5 
Test 3 115.1 116.1 














1 Taken by permission from the Thirty-ninth Yearbook of the National Society 
for the Study of Education, Intelligence: Its Nature and Nurture. 
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and social status who have not attended nursery school when tested 
after a corresponding period of time. In both cases the change is best 
ascribed to the effect of practice in taking the test. Further examina- 
tion of these tables will show that children of superior home back- 
ground, as indicated by the occupational level of the fathers, make a 
slightly greater gain from initial test to retest than is true of those 
whose fathers belong to the lower occupational classes. The difference 
is small, but it is apparent both for the nursery-school group after either 
one or two years’ attendance at nursery school and for the control 
group without nursery school training when tested after corresponding 
intervals of time. 
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Fie. 1.—Changes in IQ from first to second test according to paternal occupation 
(expressed in tenths of standard deviation from the mean of each distribution).! 


A similar pattern of differential change was observed in an earlier 
study by Goodenough in which the test used was the Kuhlmann-Binet 
(1922 Revision) and the retest was given after a mean interval of six 
weeks. The results are shown graphically in Figure 1. 

In the report of the earlier study we considered a number of possible 
explanations for the differential type of change shown by the various 
socio-economic groups. The explanation favored at that time, as 





Goodenough, Florence L.: The Kuhlmann-Binet Tests: A Critical Study of 
Evaluation, by permission of the publisher. The University of Minnesota Press, 
Minneapolis, Minnesota. 
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expressed in the following quotation, still seems to us the most plausible 
of any that we have been able to make. 

“An additional factor to be considered is the comparative accuracy 
of the two tests. It is obvious that existent differences between groups 
will only be made apparent by means of a suitable measuring instru- 
ment, and that in so far as the results of the measurement are affected 
by chance, or by other factors unrelated to the general field of inquiry, 
both the means and standard deviations of the various subgroups will 
approach more closely to the general mean and standard deviation 
derived from the total distribution. If, therefore, factors which are 
unrelated to either variable enter into the results of one test more than 
the other, it is to be expected that, all other conditions being equal, 
that test which is less affected by adventitious factors will show the 
clearest separation between groups actually differing with regard to 
the trait in question. This is, of course, simply another way of saying 
that the correlation between two variables as obtained through the use 
of fallible measuring instruments can never, except by chance, be 
greater than that obtained through true measurement, and will 
ordinarily be appreciably lower. If an improvement in the measuring 
instrument used for one variable is brought about, either by an increase 
in reliability or in validity (with reference to the particular trait 
considered), the other variable remaining as before, an increase in the 
obtained correlation is to be expected. This increase in correlation 
involves a change in the slope of the regression line with a consequent 
increase in the absolute difference between the means of the positive 
and negative arrays. That such a difference in favor of the second of 
the two tests probably does exist in the case under consideration is 
indicated by three distinct sources of evidence: (a) The correlation 
between the ratings obtained for the first and second tests as compared 
to those on the second and third for fifty-six cases who were given a 
third test after an appreciably longer interval; (b) correlation between 
half-scales on each of the two tests corrected by the Spearman-Brown 
formula; and (c) the internal consistency of the separate items with the 
total, calculated by the method of biserial r. Tt will be shown that the 
second and third of these criteria show a distinct difference in favor 
of the second test, while the first shows no significant difference 
between the two in spite of the longer interval.’’! 





1 Goodenough, Florence L.: The Kuhlmann-Binet Tests: A Critical Study and 
Evaluation, by permission of the publisher. The University of Minnesota Press, 
Minneapolis, Minnesota. 
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Table III, which is taken from the same study, shows the correla- 
tion between half-scales corrected for total length by means of the 
Spearman-Brown formula. These figures were derived from three 
hundred cases (fifty boys and fifty girls at each age) selected to con- 
stitute a representative sampling of the Minneapolis population. It 
will be noted that for each of the six groups considered, as well as for the 
total, the correlations are higher on the second test than on the first. 
This indicates that the second test yielded a more stable measure of 
child performance—or at least one that showed greater internal con- 
sistency—than the first. The explanation is probably to be found in 
better rapport at the time of the second examination with the con- 
sequent elimination of some of the adventitious factors not related to 
the purpose of the test. 

TaBue III.—Reviasruity or Torat Scate as DETERMINED BY THE CORRELATION 


BETWEEN HALF-sScALES UsING THE SPEARMAN-BROWN FormvULaA! 
(Main Experimental Group) 











Age 2 Age 3 Age 4 
Boys | Girls | Total | Boys | Girls | Total | Boys | Girls | Total 

First test: 

iid ess .853 | .836 | .845 | .899 | .913 | .910 | .816 | .865 | .854 

tea kan a .026 | .027 | .020 | .018 | .017 | .012 | .029 | .024/| .019 
Second test 

Oe PRET? .886 | .929 | .911 | .921 | .921 | .921 | .861 | .892 | .883 

ee .020 | .016 | .012 | .016 | .016 | .011 | .025 | .019 | .014 
































1 Goodenough, Florence L.: The Kuhlmann-Binet Tests: A Critical Study and 
Evaluation, by permission of the publisher. The University of Minnesota Press, 
Minneapolis, Minnesota. 


Both the earlier study on the Kuhlmann-Binet and the later study 
reported in the Thirty-Ninth Y earbook agree, then, in showing a slightly 
greater ‘‘gain”’ in intelligence quotient for children who come from 
superior family backgrounds than for children of the lower classes. 
That this difference can be attributed largely or in part to a special 
kind of regressive phenomenon resulting from the greater accuracy of 
the IQ’s obtained at the time of the second testing seems apparent from 
the data presented. It is, of course, possible that a differential growth 
factor is also operative, but our data are insufficient to show whether 
or not this is the case. The point to be emphasized at present is that 
when cases are divided, not on the basis of initial IQ but according to 
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some measured characteristic of the family background (which seems to 
us a preferable way of determining the contrast between home and 
school), those children for whom the contrast may be presumed to be 
greatest are the least affected by the change. Whether at school or at 
home, the children from the poorer homes gain less upon retest than 
those from better homes. Although the differences are small, they 
appear consistently in each of the three age groups studied in the 
earlier investigation, and after intervals of both one and two years in 
the later study. 


Taste IV.—Data FRoM TaBLE I Grovupep AccorpinG To IniTIAL IQ 














Nursery Non-nursery Total 
Mean Mean | Differ- Mean | ... 
IQ on first test 1Q | Difter- 1Q | ence Mean} 1Q | Differ- 
Mean ence be- Mean ence be- 
after after | be- IQ at | after 
Ni IQ at}. tween | N | [Qat!. N tween 
inter- inter- | tween test I | inter- 
test I mean test I mean 
val of 1Q’ val of | mean val of 1Q’ 
1 year . lyear| IQ’s 1 year ° 
Below 90........ 8} 85.50)101.38|+15.88) 9) 78.33) 87.22)/+8.89) 17) 81.70) 93.88|/+12.18 
90-99.......... 7| 95.14/105.86|+10.72| 26) 96.46)104.23|/+7.77| 33) 96.18)/104.58) +8.40 
100-109......... 18/104. 06/114.72|+10.66) 40)105.28/111.65|+6.37| 58)104.90/112.60) +7.70 
0) a 33/)113.94/115.49) +1.55) 27/114.30/115.93/+1.63/ 60)114.10)115.68) +4-1.58 
120-129......... 13/124.08)122.62} —1.46) 12)123.25)125.75|+2.50) 25)123.68|/124.12) + .44 
Above 130....... 5|136.40|127.40| —9.00) 8/135.12|127.88|—7.24| 13)135.62|127.69| —7.93 
IE EEE FR PeS ETRE AAR CGE 206 









































The answer might, of course, lie in the peculiar character of the 
University of Minnesota nursery school. It is true that we make no 
claim to stimulating the intellectual growth of children. If, however, 
the Iowa theory that their results are not to be ascribed primarily to 
regression but to variations in contrast between the stimulating value 
of the different homes and that of the nursery school is to be accepted, 
it follows that subjecting our data to the same treatment as that used 
in the Iowa studies should not show a differential trend, since the 
“‘contrast effect”? would not occur in the case of a school that does not 
influence mental growth either favorably or adversely. We have, 
therefore, reworked the data from the Yearbook study according to the 
Iowa pattern wherein the cases are grouped not according to paternal 
occupation but according to initial IQ. The results of these computa- 
tions are shown in Tables IV and V. 
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Taste V.—Data From TaBLeE II Grovupep Accorpine To In1T1AL IQ 














Nursery Non-nursery Total 
Mean Mean | Differ- Mean 
Differ- Differ- 
IQ on firet test Mean IQ ence be- Mean | ao Mean IQ ence be- 
after after | be- after 
N| IQ at inte tween |N| [Q at in t N | IQ at in tween 
test I T | mean tent Fes | Sree test 1 | A! scan 
val of 1Q’ val of | mean val of 1Q’ 
2 years . 2 years} IQ’s 2 years ’ 
Under 100....... 9} 89.67|105.33)+15.66) 8| 95.88)105.75|+9.87| 17) 92.59|/105.53|+12.94 
100-109......... 17|104.12)115.06/+-10.94) 8) 104.75/114.62|/+9.87| 25)/104.32/114.92/+10.60 
i S| ae 20/114.40/116.30) +1.90) 6) 116.33)119.50)/+3.17) 26/115.85|)117.04) +2.19 
Over 120........ 5|126.80/125.60) —1.20) 7| 131.57|126.00|—5.57| 12)129.58)125.83) —3.75 
PS Sry eae ee Bat cssestleds tase dueds 80 









































These tables show that when the fallacious statistical practices 
employed in the University of Iowa studies are duplicated, using for the 
purpose data that, when properly handled, show no effect of nursery- 
school training upon the intelligence quotient, results are obtained that 
do not differ materially from those reported from Iowa. Our conclu- 
sion is, therefore, that the Iowa statistical laboratory has played a far 
greater part in affecting the “‘intelligence” of children than has the 
Iowa nursery school, and that the differential pattern of gains and 
losses upon retest shown by children whose initial IQ’s fell at the 
extremes of the distribution is a statistical rather than an educational 
phenomenon. A similar difference appears not only in the test results 
of children attending a nursery school that makes no claim to improve 
intelligence, but also in the records of children remaining in their own 
homes, provided that the same misuse of statistical methods occurs. 











AN EXPERIMENT IN IMPROVING THE PERSONALITY 
OF HIGH-SCHOOL SENIORS 


AUSTIN H. TURNEY AND FLOYD I. COLLINS! 


The University of Kansas 


There is at present little disagreement with the statement made by 
Burnham that ‘“‘the supreme aim of education is the preservation and 
development of a wholesome personality in every child.’”’ All teachers 
and administrators are interested in personality and in learning how 
they may better direct its development. Definitions vary, and there is 
little experimental evidence leading to procedures in ordinary classroom 
situations. 

This paper presents the results of an experiment in improving the 
personality of high-school seniors through the study of personality in a 
high-school course. 


METHOD 


The experimental factor was the study of personality in a high- 
school course in psychology devoted primarily to ‘‘the psychology of 
personality improvement.” The experimental group consisted of 
twenty-one seniors in the LeRoy, Kansas, high school. It was impos- 
sible to get an equal number of seniors as a control, and for this reason 
twenty-one juniors were used. The assumption that personality 
differences are not as closely correlated with age differences as are 
mental age differences seems to be valid according to our results and 
justifies the use of juniors as a control group. The groups were neces- 
sarily small, this being a small high school, but this permitted the use 
of a method which might not serve with large groups. 

The objectives of the course were set up as follows: 

(1) To secure the coéperation and develop the belief on the part of 
the student in the improvability of his own personality. 

(2) To formulate plans for the improvement of personality by 
analyzing personality into specific traits. 

(3) To provide the student with a knowledge of characteristic 
traits of individuals with strong, pleasing personalities. 

(4) To provide the student with a knowledge of psychological 
terms essential to the understanding of the literature in the field of 
personality. 





1 Based upon a Master’s thesis by Floyd I. Collins at the University of Kansas. 
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(5) To prevent personality disorders and to develop individuals 
with wholesome personalities, adjusted to their environment, and with 
worth-while goals and a proper sense of values. 

The method used in class was directed study based upon planned 
units of psychology applied to the study of personality. These units 
were intended to provide the student with information that would 
enable him to recognize his own personality weaknesses and give him 
positive direction in developing a wholesome personality for himself. 
One difficulty was apparent from the beginning; namely, the lack of 
books written for high-school pupils. About twenty-five books were 
used as reference material supplemented by magazine articles. No 
particular claim is or could be made for the books selected. They 
included some rather difficult college texts and some popular books on 
personality or allied topics. 

The class met one hour daily during thirty-six weeks for study and 
discussion in a rather informal situation. The instructor provided 
each pupil with an analysis of the unit and with reference material. 
Each pupil was required to read and report on six books from the 
reference list. Most of them read more. 

At the beginning and at the end of the school year and following 
tests were administered to both the experimental and the control 
groups: 


Personality Inventory by Robert G. Bernreuter. 

Personality Schedule by L. L. Thurstone and Thelma Gwinn Thurstone. 
Character Sketches by J. B. Maller. 

Otis Self-administering Test of Mental Ability. 


As a check upon test results case-studies were made of all pupils in 
both groups. The case-studies were based upon careful observation, 
school records, and in some instances upon interviews. In asmall high 
school such as this one, many opportunities for close acquaintance with 
the pupils are offered. 


RESULTS 


The two groups were practically identical in IQ, but, of course, the 
mental age of the seniors was greater (thirteen months average, to be 
exact). In the initial scores on the various personality schedules, the 
averages for the two groups were not identical; in some cases the con- 
trol and in some cases the experimental group average score indicated 
better adjustment; but, whereas the control showed no marked tend- 
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ency toward improvement, the experimental group showed definite 
improvement in all but one measured trait or characteristic. 

Table I presents mean gains for the experimental and the control 
group on each subdivision of the Bernreuter test. It will be seen that 
all the differences between the mean gains are statistically significant. 


TABLE I.—Meran GaINns ON Eacu Division oF BERNREUTER PERSONALITY TESTs 








Experi- 
Scns Con- D SD D 
trol diff. jo diff. 

group 
Neurotic tendency.................. 17.6 2.5] 15.1 3.8 | 3.97 
a eal kare iW baw On 12.2 — .8| 12.5 2.7 | 4.63 
Introversion-extroversion............. 14 —3.2 | 17.2 3.5 4.91 
Dominance-submission............... 16.8 — .8| 17.1 3.5 4.88 
Fa SY a ee eee 19 3 16 3.2 | 5 
ee ee 14.7 —.09| 14.79 | 4.1 3.6 
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Table II presents our findings for the Maller Character Sketches. 


The gains made by the experimental group on the subdivisions of the © 


Maller character test were greater than those made by the control 
group, but the difference was not statistically significant in one 
instance; namely, readiness to confide. On all other subdivisions of 
this test and on the summary score the difference is marked. 


TaBLE II.—Mean Gains ON Each DIVISION OF THE MALLER CHARACTER TEST 
AND FOR THE TOTAL OR SUMMARY ScORE 








mental | CO | p | SD |_D_ 
trol diff. jo diff. 

group 
EOE TCTs SE FF 6.8 1 5.8 1.5 | 3.86 
Sere 6.3 1.5 4.5 1.3 | 3.46 
re 5.6 1.5 4.1 1.2 | 3.41 
Personal adjustment................. 6.6 1.1 5.5 .69 | 9.16 
so ond d i we sees ons 3.0 | 2.9 1.06 | 2.73 
Readiness to confide................. 4 3 a .49 .02 
Ee eee Pe ee a oe ee 29.2 7.2 | 22 4.17 | 5.27 




















In Table III are given the results for the Thurstone Personality 
Schedule, Neurotic Tendencies. The experimental group again shows 
definitely greater gains than the control group, even though the critical 
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ration is smaller than that presented in Tables I and II with the excep- 
tion noted. 


TaBLE IJJ.—MeEaAN GAIN FoR EXPERIMENTAL AND CONTROL Groups THURSTONE 
PERSONALITY ScHEDULE, NeuroTIC TENDENCIES 





Experimental group gain.................00000: 7 o diff. 3.7 
CI nce ce ncéacbsccednesccs ska —2.2 

; D 
NG Ba onc he alae of ie bande deed eca-tig ba mieel 9.2 o dit 2.48 














A detailed summary of the case studies used to validate the test 
findings is impossible. It can be said that the case studies do show a 
marked degree of agreement with the test scores. 


DISCUSSION 


It is not necessary to assume that each of the traits measured by 
the instrument used or, for that matter, that any of them are ‘“‘per- 
sonality” traits to give this experiment value. Some definitions of 
personality might exclude part or all of them. Nevertheless, the char- 
acteristics measured are usually thought of as aspects of personality, 
and improvement in them would be considered desirable by most 
educators interested in guidance. Allowing for considerable doubt as 
to validity and reliability of the instruments, the consistency of our 
results and their agreement with the case studies leave little doubt that 
the young people in the experimental group made some improvement in 
their own adjustment. 

Of course, not too much can be claimed for such an experiment. It 
would be desirable to have two groups identical in respect to maturity 
and on initial measures of the personality schedules. However, the 
very definite differences in gains made, corroborated as they are by 
the case studies, could not be explained by any defect in pairing. 
They seem in large part to be definitely the result of the pupils’ own 
study and understanding. 

It might be assumed that the gains made by the experimental 
group were the result of familiarity with the tests and of a knowledge 
of the significance of the items in them. Two facts prevent the accept- 
ance of such an explanation. First, there was no emphasis upon 
measurement of personality, nor were the particular schedules available 
to the students. Second, the test results were corroborated by case 
studies of both experimental and control groups. These case studies 
indicate that the test results are valid. 
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“SPEED VERSUS COMPREHENSION IN READING”— 
A DISCUSSION 


FRANCIS P. ROBINSON 
Ohio State University 


Professor Tinker recently made a study' of “‘the relation between 
speed and comprehension (1) by measuring rate of work and degree of 
comprehension on the same or strictly comparable material and (2) by 
employing as reading material tests ranging from very easy (‘no 
difficulty’ level) to extremely difficult material.’”’” The correlations 
between rate and comprehension on these tests, in their order of 
increasing comprehension difficulty, produced the following sequence 
of mean coefficients: .93, .87, .84, .73, .51, and .48. After a discussion 
of various factors which affected these results, he concluded “‘ material 
which requires a special background of training for its interpretation 
appears to lower the correlation,”’ e.g., the relationship between rate of 
reading and comprehension, and ‘‘ the data warrant the conclusion that 
there is an intimate relationship between speed and comprehension in 
reading when the textual material is within the reader’s educational 
experience.” 

This relationship between rate and comprehension is usually 
thought of as reading carefully enough to comprehend the material well 
enough for the purpose at hand: As the comprehension is more difficult, 
small eye span, regressions and rereading slow down the rate; as the 
comprehension becomes easier, increased reading span and decreased 
regressions cause the rate to be faster.2, Many factors affect both rate 
and comprehension, but the question as put by Tinker seems to be at 
what end of the comprehension scale are these other factors less 
important and the relationship of comprehension affecting rate most 
intimate. His conclusion is that this is greatest at the end of easiest 
comprehension. Since it is the writer’s belief that most of the experi- 
mental evidence denotes an opposite relationship, 7.e., the more diffi- 
cult the material, the more intimately it affects speed of reading, it 





1 Tinker, M. A.: “‘Speed versus Comprehension in Reading as Affected by Level 
of Difficulty.” J. Educ. Psych., Vol. xxx, 1939, pp. 81-94. 
2 Gray, C. T.: “Types of Reading Ability.” Suppl. Educ. Monog., No. 5, 1917. 
Judd, C. H. and Buswell, G. T.: “Silent Reading.” Suppl. Educ. Monog., 
No. 23, 1922. 
Walker, R. Y.: “‘The Eye Movements of Good Readers.” Psych. Rev. 
Monog. Suppl., Vol. xutv, No. 3, 1933, pp. 95-117. 
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seems necessary to analyze the above results. But first it may be well 
to present some studies upon which the latter point of view is based. 
In a study of the eye movements (and rate) of superior readers on 
material of various degrees of difficulty Walker! found that they did not 
differentiate in reading span (extent of forward shift) between very 


easy (primer) and medium difficulty material. (The average extent 


was 4.31 ems in both cases.) Between medium and very difficult 
material they did make a marked differentiation. Thus, rather than a 
straight line relationship between reading span and comprehension 
difficulty, this indicates a plateau at the easier end and a later linear 
relationship when difficulty becomes greater. On easy material for 
good readers duration of fixation and interfixation time are very near a 
physiological limit and Walker’s results seem to indicate a functional 
limit to forward shift. Although decrease in difficulty seemingly 
should allow eye movement efficiency (rate) to increase, it levels off 
(except for decreasing regression) apparently because of a functional 
limit to facility in apprehension—unless skimming is adopted. Sea- 
shore, Stockdale and Swartz? have termed this stage as ‘“‘speed of 
visual recognition”? as opposed to ‘‘speed of comprehension”’ in 
attempting to explain their results on the effect of comprehension 
difficulty on rate (see below). Anderson‘ carried through Walker’s set 
of experimental conditions with poor readers in college and made some 
interesting comparisons. Although the range of difficulty in the 
selections must have represented a functionally greater range for the 
poor readers, he found that the poor readers did not read (eye span) 
the three selections significantly differently! In this case it does not 
seem that we are dealing with an optimal reading span since it is 
subject to improvement with training.‘ Usually associated with small 
span in poor readers are carry-over habits from oral reading, e.g., lip 
movement and word by word reading, or at least a failure to progress 
from word perception to phrase perception. Since training on increas- 
ing this span (without emphasis on comprehension improvement) may 





1 Walker: Jbid. 

? Seashore, R. H., Stockdale, L. B. O., and Swartz, B. K.: “‘A Correlational 
Analysis of Factors in Speed of Reading Tests.” School and Society, Vol. xtv1, 
1937, pp. 187-192. 

* Anderson, I. H.: ‘Studies in the Eye Movements of Good and Poor Readers.”’ 
Psych. Rev. Monog. Suppl., Vol. xtv11, No. 3, 1937, pp. 1-35. 

* Robinson, F. P.: “The Réle of Eye Movements in Reading.” Univ. of Iowa 
Studies, No. 39, 1933. 
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result in increased rate, it would seem to indicate that rate in poor 
readers is not entirely a product of speed of comprehension. 

Seashore, Stockford and Swartz! studied the relationship of rate to 
comprehension, to vocabulary and to intelligence on tests of vary- 
ing degrees of comprehension difficulty. They found ‘‘a ‘power’ 
(unlimited time) test of intelligence (vocabulary) is not significantly 
related to any of our speed of visual recognition reading tests although 
it was closely related to a previous difficult test of reading comprehen- 
sion under conditions of ample but fixed time.’”’ They note further 
‘that individual differences in speed of reading recognition of moder- 
ately difficult material is not significantly determined by the factors 
underlying either a reading comprehension test or a vocabulary test.” 
Their results led them to suggest that rate on easy reading material is 
determined by what might be called ‘“‘speed of visual recognition”’ 
while rate on the more difficult comprehension tests is due to “‘speed of 
comprehension.” These suggestions fit in with the results of Walker 
and Anderson in indicating that at the more difficult levels of com- 
prehension, eye movements (rate) are in great part a product of progress 
in comprehension, while at the easier levels for good readers and prac- 
tically over the whole range for poor readers other factors probably 
accounting for what Seashore terms “‘speed of visual recognition”’ have 
increased importance in determining rate of reading. 

But what is the explanation of the high correlations between rate 
and comprehension on easy material as obtained by Tinker and of the 
reduction in these coefficients with increase in difficulty of comprehen- 
sion? It is suggested that the very high correlations on easy (‘‘no 
difficulty’) material are due to an artifact. Speed of reading is scored 
as the number of items tried within a time limit and comprehension is 
scored as the number right within this time limit. On the Chapman- 
Cook test, for instance, where accuracy of response is practically one 
hundred per cent the number right becomes entirely determined by the 
number tried (rate). Thus the coefficient (r = .997 between rate and 
comprehension on the same form) represents a correlation of rate with 
itself! Of course, it may be argued that speed of comprehension for 
this material of ‘‘no difficulty’’ determines the number of items tried 
(rate), but the evidence presented above indicates that this is not the 
only (as indicated by the almost perfect correlation) factor and prob- 
ably the most important variable consists of non-comprehension factors 





1 Seashore, Stockford, and Swartz: Ibid. 
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determining “‘speed of visual recognition” and through it rate of read- 
ing. Therefore, the better conclusion for the Chapman-Cook test is 
that speed of visual recognition (rate) determines the number right 
more than progress of comprehension determines rate. 

It might also be indicated that it is strange that when a variable is 
changed from ‘‘no difficulty’”’ (no importance?) to increased difficulty, 
its relationship to rate decreases! What is the factor causing this 
reduction? The writer believes that in the tests called more difficult 
other uncontrolled factors (due to the method of scoring the tests) cause 
the lower correlations. This can be illustrated in terms of one of the 
tests used—the Iowa Silent Reading Test, Part I. The test consists 
of a selection with numbered phrases and at the side are questions to be 
answered by finding the correct phrase and placing its number in the 
answer space. Different methods of work will give different rate and 
comprehension scores, ¢.g., some read the selection through and then 
start finding the answers, others first read a question and then look for 
the answer—some skim in looking for phrases, others read and reread. 
These different methods will cause students to reach different points 
in the questions to be covered and has little to do with speed of com- 
prehension. Some students answer the questions in order, others skip 
around, others leave some blank if not immediately found: How is one 
to determine how many questions are attempted (rate)? In difficult 
comprehension tests regressions and much rereading occur but the rate 
score merely represents forward distance once covered. These spuri- 
ous factors lower the correlations on the more difficult tests. 

It is suggested, therefore, that while the correlations given by 
Tinker measure the relationship between rate score and comprehension 
score as defined by him for these tests, these variables, as scored, do not 
always represent good measures of rate nor of comprehension. It is 
felt that the other evidence presented indicates that rate is most 
intimately affected by speed of comprehension at the more difficult end 
of the scale. 

In discussing the relationship between rate and comprehension 
another aspect, 7.e., comprehension accuracy and rate, should also be 
presented. Seashore, Stockdale and Swartz! report that ‘“‘speed may 
vary independently of comprehension (accuracy) in our type of test.” 
In an experiment conducted in our laboratory the correlations between 
rate for nine minutes and untimed comprehension accuracy on each of 





1 Seashore, Stockdale and Swartz: Ibid. 
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the following selections: Short story, art article, history article and 
geology article (rough order of difficulty) were .08, —.10, .00, and —.23, 
respectively, with a group of ninety-three college students. These low 
relationships probably arise from the intent of the reader to compre- 
hend to a degree to suit his purpose. As the material shifts in diffi- 
culty, the reader shifts his rate to attain the same desired degree of 
comprehension. Variability in comprehension accuracy with identical 
purpose would then be a matter of uncontrolled factors, although one 
study! has shown that poor readers are not able to maintain their 
proportionate performance to good readers when material increases too 
greatly in comprehension difficulty. 





1 Robinson, F. P. and McCollom, F. H.: ‘ Reading Rate and Comprehension 
Accuracy as Determinants of Reading Test Scores.” J. Educ. Psych., Vol. xxv, 
1934, pp. 154-157. 
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DR. ROBINSON ON SPEED VERSUS COMPREHENSION 
IN READING: A DISCUSSION 


MILES A. TINKER 


University of Minnesota 


The points made by Dr. Robinson! in discussing my article? on the 
relationship between speed and comprehension in reading seem to be 
based in part on misinterpretation and in part on a difference in view- 
point. Without attempting to discuss each point made by Robinson, 
a few remarks may help to clarify my own views and interpretations. 

Certain writers seem to believe that speed of reading is something 
which can be divorced from comprehension and still remain a valid 
measure of reading performance. A measure of the rate with which 
words are recognized as words, however, with no reference to apprehen- 
sion of the relationships and the meanings involved, yields a score of 
little or no significance in the reading situation. In other words, 
“reading” without comprehension is not reading. The only adequate 
definition of speed of reading is to consider it rate of comprehension. 
To measure speed of reading, therefore, one must measure the rate with 
which material is comprehended. Thus, in practice, it is important to 
know the rate with which history text is comprehended, etc. In the 
reading-test situation, rate of reading is rate of comprehending as 
measured in the test. One need not necessarily agree that a particular 
method is the best or preferred technique for measuring comprehension. 

Contrary to the notion held by many writers there is no general 
reading ability. There are ample data which demonstrate that reading 
proficiency is specific to the reading situation whether measured in 
terms of rate of comprehension or degree of comprehension. 

Robinson’s statement that the “relationship between rate and 
comprehension is usually thought of as reading carefully enough to 
comprehend the material well enough for the purpose at hand”’ is a 
misstatement of the situation. The whole development of research 
concerning this relationship has been centered around correlations 
between rate scores and comprehension scores.* Since Robinson’s 
argument is centered about his own definition, his attempted criticisms 
have little bearing on the results and conclusions in my report. 





1 Robinson, F. P.: “Speed versus comprehension in reading: A discussion.” 
J. Educ. Psychol., October, 1940, pp. 554-558. 

* Tinker, M. A.: “Speed versus comprehension in reading as affected by level of 
difficulty.” J. Educ. Psychol., 1939, Vol. xxx, pp. 81-94. 

* Tinker, M. A.: ‘The Relation of speed to comprehension in reading.’ School 
and Society, Vol. xxxvi, 1932, pp. 58-160. 
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Robinson’s (unsubstantiated) view that “the more difficult the 
material, the more intimately it affects speed of reading” need not be 
considered as entirely out of harmony with my findings. He has 
apparently failed, however, to read my statement that the lower 
correlations between rate and comprehension at the difficult levels is 
probably due to lack of acquaintance with the textual subject-matter, 
method of response required, and method of scoring in measuring 
comprehension. Presumably, these factors have led to inconsistencies 
in rate of work from reader to reader. As a matter of fact, Robinson 
suggests that the reduction in size of coefficients at the difficult level is 
caused by ‘uncontrolled factors (due to the method of scoring the 
tests).”” Thus he is advancing an explanation which is essentially the 
same as given on page 94 of my article. 

There is no statement in my report which says there is a straight- 
line relationship between rate of reading and difficulty of material as 
implied by Robinson. Hence, his arguments on this point are 
irrelevant. 

No writer familiar with the literature on reading would state that 
reading easy material is done without comprehension. Reading in the 
true sense of the term involves comprehension irrespective of the level 
of difficulty. Thus to label rate of comprehension in reading easy 
material “‘speed of visual recognition” can serve no useful purpose but 
may obscure the issue. 

Robinson’s statement that the very high correlations on easy 
material (Chapman-Cook Test) are due to an artifact because “speed 
of reading is scored as the number of items tried within a time limit and 
comprehension is scored as the number right within this time limit” is 
an unjustified interpretation of my results (pp. 84-85). The coeffi- 
cients emphasized were from correlating number right in a set time on 
one form of the test (comprehension), and time taken for all items on 
the other form (speed). The coefficients corrected for attenuation 
were —1.00. There is no artifact here. 

Robinson admits that the correlations in my report do “‘ measure the 
relationship between rate score and comprehension score as defined by”’ me. 
Practical measurement of reading proficiency at present is in terms of 
the standardized reading tests that are available. Most of us will 
agree that the measurement of comprehension may be inadequate. 
Until someone can devise a reliable and more satisfactory measure of 
comprehension, therefore, we must work with the measuring devices 
now available. 
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